Link Checker and Transformer

Link checker provide the capability to validate all internal or external links authored on content page. AEM link checker is an event based and gets triggered on update of content.

It is not advisable to use link checker for large repositories which is going to have frequent changes/updates in links.

As it is event based and on creation or modification in node inside /content folder structure will create below mapping inside /var/linkchecker

Link Checker is responsible For:

  1. Validating both external and internal links authored on the pages.
  2. Show list of all external links authored on pages.
  3. Perform link rewritten / transformation.

Internal Links

Internal links is all about AEM content pages link of same instance starting from /content. e.g. /content/<projects>/us/en/home.html

Internal links on page gets validated as soon as added or updated on a page.

External Link

External links are those links which are outside of the AEM instance or different domain. e.g. https://www.google.com

External links validated based on their syntax and by checking their availability.

Broken links on author page

  • Both internal and external links in author shown as broken link and looks like below highlighted red in color:

Broken links on publish page

  • Both internal and external links on publish appears as plain text and link gets remove internally.

External Link Checker User Interface

Below URL will provide complete information around all internal and external links authored on AEM pages.

http://localhost:4502/etc/linkchecker.html

We dragged and dropped two different teaser components. On one of the component authored an internal link and on other component authored an external link.

Link Checker Interface will show below section highlighted blue in color for teaser component having an incorrect external URL as https://ww.test.com.

Link checker user interface will have below entry highlighted red in color for teaser component as we authored an internal URL /content/test/practice which doesn’t exist. In case of internal link, it will show component link under which we authored link in place of showing authored link itself.

Below URL will return list of all links validated as part of link checker:
http://localhost:4502/var/linkchecker.list.json?_dc=1667974306388

Link checker and transformer service configurations

Below are the system console level configuration for link checker and transformer:

Day CQ Link Checker Service

Service allows us to validate syntax of external links. Link checker enables by default on all instances. Override Day CQ Link Checker Service configurations for any customization or project specific configuration changes.

Scheduler Period: Allow us to provide period in terms of number of seconds to call this service repeatedly.

Link Check Override Patterns: This property allow us to ignore particular link to get validate. For example, provide an entry of ^http://www.google.com will ignore http://www.google.com link by link checker to validate.

Day CQ Link Checker Task

This task or service allows us to show changes of link status from pending state to valid or invalid. Changes will show on below URL once this task executed.

http://localhost:4502/etc/linkchecker.html

According to below configuration Scheduler Period, this task will run in every 60 minute.

Day CQ Link Checker Transformer

Service allow us to transform URL with the help of link checker.

For Example: link transformer will rewritten or transform URL from /content/practice/us/en/test.html to /test.html

Disable Checking property will allow us to disable link checking completely for a particular AEM instance.

Disable Rewriting property will allow us to stop URL rewritten or transformation.

linkcheckertransformer.rewriteElements allows us to add tag and attribute as tag:attribute to transform attribute value.

Disable Link Checker using code

There are two ways to disable link checker using code for particular tag:

  1. By adding x-cq-linkchecker=”valid” attribute as part of anchor <a> or other tags. Link checker by default will mark it as valid.

<a href=”https://ww.test.com” x-cq-linkchecker=”valid”></a>

2. By adding x-cq-linkchecker=”skip” attribute as part of anchor <a> or other tags. Link checker will not validate this link.<a href=”https://ww.test.com” x-cq-linkchecker=”skip”></a>

Link Transformer / rewriter Implementation

As name suggested, link transformer allow us to transform link on page load. We will be able to achieve the same with the help of config and custom transformer class.

e.g. If we have a image src URL as /content/dam/practice/us/en/book.png and with the help of custom Transformer we can transform this URL to http://cdn/assets/practice/us/en/book.png OR
http://localhost:4502/content/dam/practice/us/en/book.png OR any other mapping URL.

Similarly, we can also trim content URL from /content/practice/us/en/home.html to /us/en/home.html also with the help of transformer as shown below:

Follow below steps to create custom link transformer:

  1. Create rewriter folder beneath config folder for rewriter to work on both author and publish.

Below configurations are required for link transformer to get work.

  • Mark enabled property as true.
  • Provide project specific paths.
  • Provide transformerTypes as practice-linkrewriter.

Important Note: transformerTypes as practice-linkrewriter must be unique throughout the AEM instance. Same name practice-linkrewriter we will be using in MyRewriterTransformer.java for mapping.

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    jcr:primaryType="nt:unstructured"
    contentTypes="[text/html]"
    enabled="{Boolean}true"
    generatorType="htmlparser"
    order="1"
    paths="[/content/practice]"
    serializerType="htmlwriter"
    transformerTypes="[linkchecker,versioned-clientlibs,practice-linkrewriter]">
    <generator-htmlparser
        jcr:primaryType="nt:unstructured"
        includeTags="[A,/A,LINK,IMG]"/>
</jcr:root>

Note: We can also copy configurations from /libs/cq/config/rewriter/default node.

Intellij View

crx/de view

Download config package from Link

2. Create below custom LinkTransformer class implementing Transformer and TransformerFactory as shown below:

package com.javadoubts.core.transformer;

import org.apache.commons.lang3.StringUtils;
import org.apache.sling.rewriter.ProcessingComponentConfiguration;
import org.apache.sling.rewriter.ProcessingContext;
import org.apache.sling.rewriter.Transformer;
import org.apache.sling.rewriter.TransformerFactory;
import org.osgi.service.component.annotations.Component;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.AttributesImpl;

import java.io.IOException;

@Component(
  immediate = true,
  service = TransformerFactory.class,
  property = {
    "pipeline.type=practice-linkrewriter"
  }
)
public class MyRewriterTransformer implements Transformer, TransformerFactory {

  private ContentHandler contentHandler;

  @Override
  public Transformer createTransformer() {
    return new MyRewriterTransformer();
  }

  @Override
  public void init(ProcessingContext processingContext, ProcessingComponentConfiguration processingComponentConfiguration) throws IOException {

  }

  @Override
  public void setContentHandler(ContentHandler handler) {
    this.contentHandler = handler;
  }

  @Override
  public void dispose() {

  }

  @Override
  public void setDocumentLocator(Locator locator) {
    contentHandler.setDocumentLocator(locator);
  }

  @Override
  public void startDocument() throws SAXException {
    contentHandler.startDocument();
  }

  @Override
  public void endDocument() throws SAXException {
    contentHandler.endDocument();
  }

  @Override
  public void startPrefixMapping(String prefix, String uri) throws SAXException {
    contentHandler.startPrefixMapping(prefix, uri);
  }

  @Override
  public void endPrefixMapping(String prefix) throws SAXException {
    contentHandler.endPrefixMapping(prefix);
  }

  /* 
    This is the main function which is responsible for URL 
    main update.
  */
  @Override
  public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {

    // This will trim /content/practice/us/en/home to /us/en/home on page load 
    if (atts.getIndex("href") > -1 && qName.equalsIgnoreCase("a")) {
      AttributesImpl modifiedAttributes = new AttributesImpl(atts);
      String sortHref = modifiedUrl(atts.getValue("href"));

      modifiedAttributes.setValue(atts.getIndex("href"), sortHref);
      contentHandler.startElement(uri, localName, qName, modifiedAttributes);
    } 

    // This will append http://localhost:4502 in front of asset URL
    if (atts.getIndex("src") > -1 && qName.equalsIgnoreCase("img")) {
      AttributesImpl modifiedAttributes = new AttributesImpl(atts);
      String sortHref = "http://localhost:4502"+modifiedUrl(atts.getValue("src"));

      modifiedAttributes.setValue(atts.getIndex("src"), sortHref);
      contentHandler.startElement(uri, localName, qName, modifiedAttributes);
    }
  }

  public static String modifiedUrl(String path) {
    if (StringUtils.isBlank(path)) {
      return path; // blank, return it as is.
    } else {
      if(path.startsWith("/content/practice")) {
        return StringUtils.removeAll(path, "/content/practice");
      }
    }
    return path;
  }

  @Override
  public void endElement(String uri, String localName, String qName) throws SAXException {
    contentHandler.endElement(uri, localName, qName);
  }

  @Override
  public void characters(char[] ch, int start, int length) throws SAXException {
    contentHandler.characters(ch, start, length);
  }

  @Override
  public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException {
    contentHandler.ignorableWhitespace(ch, start, length);
  }

  @Override
  public void processingInstruction(String target, String data) throws SAXException {
    contentHandler.processingInstruction(target, data);
  }

  @Override
  public void skippedEntity(String name) throws SAXException {
    contentHandler.skippedEntity(name);
  }
}

3. Author anchor link and text component on the page.

On page load, the MyRewriterTransformer.java class with get call multiple times for each and every image src and anchor link. At the same time it will call startElement() function to update URL’s according to our need.

Sling Rewriter Configurations

Hit below URL to see all sling rewriters deployed on current AEM instance.

http://localhost:4502/system/console/status-slingrewriter

Imran Khan

Specialist Master (Architect) with a passion for cutting-edge technologies like AEM (Adobe Experience Manager) and a proven track record of delivering high-quality software solutions.

  • Languages: Java, Python
  • Frameworks: J2EE, Spring, Struts 2.0, Hibernate
  • Web Technologies: React, HTML, CSS
  • Analytics: Adobe Analytics
  • Tools & Technologies: IntelliJ, JIRA

🌐 LinkedIn

📝 Blogs

📧 Imran Khan