You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by lewismc <gi...@git.apache.org> on 2015/03/30 18:43:31 UTC

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

GitHub user lewismc opened a pull request:

    https://github.com/apache/any23/pull/17

    ANY23-247 FIX Attribute name itemscope associated with an element type html must be followed by the ' = ' character.

    Hi Folks,
    PR which fixes this issue locally. I am getting clean builds now again after introducing this new MissingItemscopeAttributeValueRule class.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lewismc/any23 ANY23-247

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17
    
----
commit 5ac2307a0245f06f07cbdbe300bc8608f73b1ba1
Author: Lewis John McGibbney <le...@jpl.nasa.gov>
Date:   2015-03-30T16:43:25Z

    ANY23-247 FIX Attribute name itemscope associated with an element type html must be followed by the ' = ' character.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201564662
  
    I tested this pull request and it has a few failing tests for me. I know that the Any23 master hasn't been perfect for its test record (mostly due to unreliable remote queries), but I haven't been watching recently to know which tests are expected to fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-88051079
  
    @ansell done, the branch is now 2 ahead of master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201537723
  
    hi @ansell OK I've added in the correct rule and fix as well as a test to verify that empty itemscope values are identified and fixed. 
    Whilst debugging this however the core issue persists. Reasoning for this is that ```RDFa11Extractor extends BaseRDFExtractor``` which inherits the [parser function inputstream parameter](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java#L105). This input stream is not the 'fixed' steam but the raw document. 
    The only way I can think around this is for us to 
     * refactor the [RDFa1.1Extractor](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11Extractor.java) such that it extends [TagSoupDomExtractor](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60) as oppose to (eventually) the [ContentExtractor](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44), or
     * undertake a mass refactoring which essentially removes the [ContentExtractor](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44) altogether... this would provide us with a much more flexible and adaptable extraction framework IMHO.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/any23/pull/17#discussion_r27442017
  
    --- Diff: core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *  http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.any23.validator.rule;
    +
    +import org.apache.any23.validator.DOMDocument;
    +import org.apache.any23.validator.Fix;
    +import org.apache.any23.validator.Rule;
    +import org.apache.any23.validator.RuleContext;
    +
    +/**
    + * This fixes missing attribute values for the 'itemscope' attribute, 
    + * which was be associated with <div> nodes.
    + * Typically when such a snippet of XHTML is fed through the 
    + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    + * it will result in the following behavior. 
    + * <pre>
    + * {@code
    + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    + * }
    + * </pre>
    + * This Fix is an effort to mitigate against that happening. 
    + *
    + */
    +public class MissingItemscopeAttributeValueRule implements Fix {
    --- End diff --
    
    I looked for it being registered during a single document extraction. It was my understanding that validation and fixes are registered and active as part of the extraction parameters agenda? If a vanilla SingleDocumentExtration is invoked... as per the Any23Test then by default the Fixes and Validations are activated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-88056599
  
    When I debug this, a good place to set a breakpoint is at line 
    https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L253
    The parse fails on the RDFA1.1 parser with the following error... still
    ```
      [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    [2015-03-31 04:46:46,618]DEBUG544766[main] - org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:488) - html-rdfa11: Error while parsing RDF document.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201545776
  
    The system does seem a little too complex for our purposes and isn't usable because of that.
    
    Removing generics would be the first step IMO as there are too many rawtypes definitions which indicate generics are being used badly.
    
    ContentExtractor may be able to be completely removed instead of being refitted into the process after that and the parser should always be set to parse as far as practical for our purposes.
    
    It is a little strange that there isn't a buffered, markable, InputStream provided for all of the steps to reuse as necessary rather than pushing a raw InputStream or other source into different extractors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-87894632
  
    Could you rebase your branch onto upstream master and try again? 
    
    The place where the error started to become visible as a test failure (when I started rethrowing an exception that was being swallowed incorrectly) is on the current master, but your master branch is 4 commits behind that so the test will still silently succeed on your branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on a diff in the pull request:

    https://github.com/apache/any23/pull/17#discussion_r27446443
  
    --- Diff: core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *  http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.any23.validator.rule;
    +
    +import org.apache.any23.validator.DOMDocument;
    +import org.apache.any23.validator.Fix;
    +import org.apache.any23.validator.Rule;
    +import org.apache.any23.validator.RuleContext;
    +
    +/**
    + * This fixes missing attribute values for the 'itemscope' attribute, 
    + * which was be associated with <div> nodes.
    + * Typically when such a snippet of XHTML is fed through the 
    + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    + * it will result in the following behavior. 
    + * <pre>
    + * {@code
    + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    + * }
    + * </pre>
    + * This Fix is an effort to mitigate against that happening. 
    + *
    + */
    +public class MissingItemscopeAttributeValueRule implements Fix {
    --- End diff --
    
    There is a hardcoded set in DefaultValidator.loadDefaultRules, but I can't find any place that is doing classpath scanning there.
    
    I also do not understand the relationship between Rule and Fix. In the DefaultValidator, there are either Rule, or Rule+Fix, not just a Fix like you have here.
    
    I will look into it further when I get a chance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201573785
  
    ACK @ansell , master branch is unstable with the following test failures
    
    https://builds.apache.org/view/A-D/view/Any23/job/Any23-trunk/1466/#showFailuresLink
    
    If you can reproduce this locally (or up until your test build fails within core with 3 failing tests) then that is the 'expected' behaviour right now. The Microdata test is directly related to the issue we are now discussing here. 
    
    This issue is the most pressing for Any23 right now, IMHO it is a complete blocker to us releasing Any23 1.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/any23/pull/17


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/any23/pull/17#discussion_r27444885
  
    --- Diff: core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *  http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.any23.validator.rule;
    +
    +import org.apache.any23.validator.DOMDocument;
    +import org.apache.any23.validator.Fix;
    +import org.apache.any23.validator.Rule;
    +import org.apache.any23.validator.RuleContext;
    +
    +/**
    + * This fixes missing attribute values for the 'itemscope' attribute, 
    + * which was be associated with <div> nodes.
    + * Typically when such a snippet of XHTML is fed through the 
    + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    + * it will result in the following behavior. 
    + * <pre>
    + * {@code
    + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    + * }
    + * </pre>
    + * This Fix is an effort to mitigate against that happening. 
    + *
    + */
    +public class MissingItemscopeAttributeValueRule implements Fix {
    --- End diff --
    
    Ack
    
    On Monday, March 30, 2015, Peter Ansell <no...@github.com> wrote:
    
    > In
    > core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
    > <https://github.com/apache/any23/pull/17#discussion_r27442717>:
    >
    > > +/**
    > > + * This fixes missing attribute values for the 'itemscope' attribute,
    > > + * which was be associated with <div> nodes.
    > > + * Typically when such a snippet of XHTML is fed through the
    > > + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    > > + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    > > + * it will result in the following behavior.
    > > + * <pre>
    > > + * {@code
    > > + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    > > + * }
    > > + * </pre>
    > > + * This Fix is an effort to mitigate against that happening.
    > > + *
    > > + */
    > > +public class MissingItemscopeAttributeValueRule implements Fix {
    >
    > It may be done using a classpath scan. I will look into it further.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/any23/pull/17/files#r27442717>.
    >
    
    
    -- 
    *Lewis*



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201550510
  
    I agree. Jumping through this in the debugged made me think the same.
    I think it is different if Any23 is to be a PURE implementation... But that
    is clearly not the case. Any23 fits in best when it can be used to extract
    semantics from any old crap input that it is fed. Parsers and extractors
    *should not* fail when there is a piece of crap input HTML. Currently,
    that's exactly what happens and it is extremely limiting.
    
    I would like to propose that this PR is committed to master as is, we then
    open a brand new issue which acts exactly your comments refactoring out
    content extractor and reusing the input stream which has been fixed, etc.
    
    Any thoughts Peter? Thanks fr quick response.
    
    On Friday, March 25, 2016, Peter Ansell <no...@github.com> wrote:
    
    > The system does seem a little too complex for our purposes and isn't
    > usable because of that.
    >
    > Removing generics would be the first step IMO as there are too many
    > rawtypes definitions which indicate generics are being used badly.
    >
    > ContentExtractor may be able to be completely removed instead of being
    > refitted into the process after that and the parser should always be set to
    > parse as far as practical for our purposes.
    >
    > It is a little strange that there isn't a buffered, markable, InputStream
    > provided for all of the steps to reuse as necessary rather than pushing a
    > raw InputStream or other source into different extractors.
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/any23/pull/17#issuecomment-201545776>
    >
    
    
    -- 
    *Lewis*



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-202702530
  
    @ansell any further comments here? I will try to get to work on the larger issue this week. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-186536685
  
    @ansell the line I am getting the error on is away down in semargl here
    https://github.com/levkhomich/semargl/blob/ee8b35fc330deae6cb623fa3c57f583f3684bb76/rdfa/src/main/java/org/semarglproject/rdf/rdfa/RdfaParser.java#L1130
    I am going to investigate this issue again this weekend as it is high time we got Any23 master back to successful healthy builds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/any23/pull/17#discussion_r27444925
  
    --- Diff: core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *  http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.any23.validator.rule;
    +
    +import org.apache.any23.validator.DOMDocument;
    +import org.apache.any23.validator.Fix;
    +import org.apache.any23.validator.Rule;
    +import org.apache.any23.validator.RuleContext;
    +
    +/**
    + * This fixes missing attribute values for the 'itemscope' attribute, 
    + * which was be associated with <div> nodes.
    + * Typically when such a snippet of XHTML is fed through the 
    + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    + * it will result in the following behavior. 
    + * <pre>
    + * {@code
    + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    + * }
    + * </pre>
    + * This Fix is an effort to mitigate against that happening. 
    + *
    + */
    +public class MissingItemscopeAttributeValueRule implements Fix {
    --- End diff --
    
    Everything I've uploaded to the patch is what I have coded. There is no
    other black magic on my end to get this invoked.
    
    On Monday, March 30, 2015, Lewis John Mcgibbney <le...@gmail.com>
    wrote:
    
    > Ack
    >
    > On Monday, March 30, 2015, Peter Ansell <notifications@github.com
    > <javascript:_e(%7B%7D,'cvml','notifications@github.com');>> wrote:
    >
    >> In
    >> core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
    >> <https://github.com/apache/any23/pull/17#discussion_r27442717>:
    >>
    >> > +/**
    >> > + * This fixes missing attribute values for the 'itemscope' attribute,
    >> > + * which was be associated with <div> nodes.
    >> > + * Typically when such a snippet of XHTML is fed through the
    >> > + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    >> > + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    >> > + * it will result in the following behavior.
    >> > + * <pre>
    >> > + * {@code
    >> > + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    >> > + * }
    >> > + * </pre>
    >> > + * This Fix is an effort to mitigate against that happening.
    >> > + *
    >> > + */
    >> > +public class MissingItemscopeAttributeValueRule implements Fix {
    >>
    >> It may be done using a classpath scan. I will look into it further.
    >>
    >> —
    >> Reply to this email directly or view it on GitHub
    >> <https://github.com/apache/any23/pull/17/files#r27442717>.
    >>
    >
    >
    > --
    > *Lewis*
    >
    >
    
    -- 
    *Lewis*



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-88056267
  
    By the way @ansell, an observation is that whenever we make an attempt to infer the document language, we never succeed. It is always returns null. On every single occasion I get back null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on a diff in the pull request:

    https://github.com/apache/any23/pull/17#discussion_r27442717
  
    --- Diff: core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *  http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.any23.validator.rule;
    +
    +import org.apache.any23.validator.DOMDocument;
    +import org.apache.any23.validator.Fix;
    +import org.apache.any23.validator.Rule;
    +import org.apache.any23.validator.RuleContext;
    +
    +/**
    + * This fixes missing attribute values for the 'itemscope' attribute, 
    + * which was be associated with <div> nodes.
    + * Typically when such a snippet of XHTML is fed through the 
    + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    + * it will result in the following behavior. 
    + * <pre>
    + * {@code
    + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    + * }
    + * </pre>
    + * This Fix is an effort to mitigate against that happening. 
    + *
    + */
    +public class MissingItemscopeAttributeValueRule implements Fix {
    --- End diff --
    
    It may be done using a classpath scan. I will look into it further.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on a diff in the pull request:

    https://github.com/apache/any23/pull/17#discussion_r27437088
  
    --- Diff: core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *  http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.any23.validator.rule;
    +
    +import org.apache.any23.validator.DOMDocument;
    +import org.apache.any23.validator.Fix;
    +import org.apache.any23.validator.Rule;
    +import org.apache.any23.validator.RuleContext;
    +
    +/**
    + * This fixes missing attribute values for the 'itemscope' attribute, 
    + * which was be associated with <div> nodes.
    + * Typically when such a snippet of XHTML is fed through the 
    + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
    + * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
    + * it will result in the following behavior. 
    + * <pre>
    + * {@code
    + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
    + * }
    + * </pre>
    + * This Fix is an effort to mitigate against that happening. 
    + *
    + */
    +public class MissingItemscopeAttributeValueRule implements Fix {
    --- End diff --
    
    How is this class recognised or instantiated? META-INF/services/ or another method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---