1. Introduction and Goals
HtmlSC shall support authors creating digital formats with hyperlinks and integration of images and similar resources.
1.1. Requirements Overview
The overall goal of HtmlSC is to create neat and clear reports, showing errors within HTML files - as shown in the adjoining figure.
1.1.1. Basic Usage
-
A user configures the location (directory and filename) of one or more HTML file(s),
-
and the corresponding images directory.
-
HtmlSC performs various checks on the HTML and
-
reports its results either on the console or as HTML report.
HtmlSC can run from the command line or as Gradle-plugin.
Terminology: What Can Go Wrong in HTML Files?
Apart from purely syntactical errors, many things can go wrong in html, especially with respect to hyperlinks, anchors and id’s - as those are often manually maintained.
Primary sources of problems are bad links (in technical terms: URIs). For further information, see the background information on URIs.
See DuplicateIdChecker.
Checking and reporting these errors and flaws is the central business requirement of HtmlSC.
Important terms (domain terms) of html sanity checking is documented in a (small) domain model.
1.1.2. General Functionality
ID | Functionality | Description |
---|---|---|
G-1 |
read HTML file |
HtmlSC shall read a single (configurable) HTML file |
G-2 |
Gradle-plugin |
HtmlSC can be run as Gradle-plugin. |
G-3 |
command line usage |
HtmlSC can be called from the command line with arguments and options |
G-4 |
configurable output |
output can be configured to console or file |
G-5 |
free and open source |
all required dependencies shall be compliant to the CC-SA-4 licence. |
G-6 |
available via public repositories |
like bintray or jcenter. |
G-7 |
configurable to check multiple HTML files |
configure a set of files to be processes in a single run and produce a joint report. (useful for e.g. API documentation with many HTML files referencing each other) |
1.1.3. Types of Sanity Checks
ID | Check | Description |
---|---|---|
R-1 |
missing image files |
Check all image tags if the referenced image files exist. See [MissingImageFilesChecker] |
R-2 |
broken internal links |
Check all internal links from anchor-tags (href="#XYZ") if the link targets "XYZ" are defined. See [BrokenCrossReferencesChecker] |
R-3 |
missing local files |
either other html-files, pdf’s or similar. See [MissingLocalResourcesChecker] |
R-4 |
duplicate link targets |
Check all bookmark definitions (… id="XYZ") whether the id’s ("XYZ") are unique. See [DuplicateIdChecker] |
R-5 |
malformed links |
Check all links for syntactical correctness |
R-6 |
missing alt-attribute |
in image-tags. See [MissingImgAltAttributeChecker] |
R-7 |
unused-images |
Check for files in image-directories that are not referenced by any of the HTML files in this run |
R-8 |
illegal link targets |
Check for malformed or illegal anchors (link targets). |
ID | Check | Description |
---|---|---|
Opt-1 |
missing external images |
Check externally referenced images for availability |
Opt-2 |
broken external links |
Check external links for both syntax and availability |
1.1.4. Reporting and Output Requirements
ID | Requirement | Description |
---|---|---|
R-1 |
various output formats |
Checking output in plain text and HTML |
R-2 |
output to stdout |
HtmlSC can output results on stdout (the console) |
R-3 |
configurable file output |
HtmlSC can store results in file in configurable directories |
1.2. Quality Goals
Priority | Quality-Goal | Scenario |
---|---|---|
1 |
Correctness |
Every broken internal link (cross reference) is found. |
1 |
Correctness |
Every missing local image is found. |
2 |
Flexibility |
Multiple checking algorithms, report formats and clients. At least Gradle, command-line and a graphical client have to be supported. |
2 |
Safety |
Content of the files to be checked is never altered. |
2 |
Correctness |
Correctness of every checker is automatically tested for positive AND negative cases |
2 |
Correctness |
Every reporting format is tested: Reports must exactly reflect checking results. |
3 |
Performance |
Check of 100kB html file performed under 10 secs (excluding gradle startup) |
1.3. Stakeholder
Role | Description | Goal, Intention |
---|---|---|
Documentation author |
writes documentation with Html output |
wants to check that the resulting document contains good links, image references |
arc42 user |
uses the arc42 template for architecture documentation |
wants a small but practical example of how to apply arc42. |
aim42 contributor |
contributes to aim42 methode-guide |
check generated html code to ensure links and images are correct during (gradle-based) build process |
software developer |
wants an example of pragmatic architecture documentation and arc42 usage |
1.4. Background Information on URIs
The generic structure of a Uniform Resource Identifier consists of the following parts: [type][://][subdomain][domain][port][path][file][query][hash]
An example, visualized:
The java.net.URL
class contains a generic parser for URLs and URIs.
See the following snippet, taken from the unit test class URLUtilTest.groovy
:
@Test
public void testGenericURISyntax() {
// based upon an example from the Oracle(tm) Java tutorial:
// http://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
def aURL =
new URL("http://example.com:42/docs/tutorial/index.html?name=aim42#INTRO");
aURL.with {
assert getProtocol() == "http"
assert getAuthority() == "example.com:42"
assert getHost() == "example.com"
assert getPort() == 42
assert getPath() == "/docs/tutorial/index.html"
assert getQuery() == "name=aim42"
assert getRef() == "INTRO"
}
}
URIs are used to reference other resources. For HtmlSC it is useful to distinguish between internal (== local)and external references:
-
Internal references, a.k.a. Cross-References
-
External references
1.4.1. Intra-Document URIs
a file… ref can be an internal link, or a URI without protocol…
1.4.2. References on URIs and HTML Syntax
-
IETF RFC-2396 on URI Syntax: The fundamental reference!
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.