Introduction

In the article “Traefik with Step-CA for Automatic HTTPS”, we have already implemented certificate application methods based on tlsChallenge and httpChallenge. However, neither of these methods supports wildcard certificate applications.

To apply for wildcard certificates, it must be implemented through dnsChallenge, and the general process of dnsChallenge is as follows:

DNS Challenge workflow

In other words, the most critical part here is the DNS API. Through the DNS API, TXT records are added to the DNS. The value of the record is generated by the CA and returned to the ACME client. The client writes to the DNS through the API, and then the CA verifies through DNS queries whether the domain belongs to the applicant!

ACME DNS

Usually, cloud DNS service providers offer corresponding ACME support, but some traditional DNS service providers may not support the ACME protocol, which is why ACME DNS exists! If you’re interested, you can read this article A Technical Deep Dive: Securing the Automation of ACME DNS Challenge Validation.

Project repository: joohoi/acme-dns

The configuration of ACME DNS is as follows:

[general]
# DNS interface. Note that systemd-resolved may reserve port 53 on 127.0.0.53
# In this case acme-dns will error out and you will need to define the listening interface
# for example: listen = "127.0.0.1:53"
listen = "0.0.0.0:53"
# protocol, "both", "both4", "both6", "udp", "udp4", "udp6" or "tcp", "tcp4", "tcp6"
protocol = "both"
# domain name to serve the requests off of
domain = "dns.svc.dev"
# zone name server
nsname = "dns.svc.dev"
# admin email address, where @ is substituted with .
nsadmin = "admin.svc.dev"
# predefined records served in addition to the TXT
records = [
    # Traefik container IP
    "*.svc.dev. A 10.8.10.252",

    # Step-CA container IP
    "ca.svc.dev. A 10.8.10.254",

    # domain pointing to the public IP of your acme-dns server 
    "dns.svc.dev. A 10.8.10.253",

    # specify that auth.example.org will resolve any *.auth.example.org records
    "dns.svc.dev. NS dns.svc.dev."
]
# debug messages from CORS etc
debug = true

[database]
# Database engine to use, sqlite3 or postgres
engine = "sqlite3"
# Connection string, filename for sqlite3 and postgres://$username:$password@$host/$db_name for postgres
# Please note that the default Docker image uses path /var/lib/acme-dns/acme-dns.db for sqlite3
connection = "/database/acme-dns.db"
# connection = "postgres://acme-dns:[email protected]:5432/acme-dns?sslmode=disable"

[api]
# listen ip eg. 127.0.0.1
ip = "0.0.0.0"
# possible values: "letsencrypt", "letsencryptstaging", "cert", "none"
tls = "letsencrypt"
# listen port, eg. 443 for default HTTPS
port = "443"
# disable registration endpoint
disable_registration = false
# only used if tls = "cert"
tls_cert_privkey = "/etc/tls/example.org/privkey.pem"
tls_cert_fullchain = "/etc/tls/example.org/fullchain.pem"
# only used if tls = "letsencrypt"
acme_cache_dir = "certs"
# optional e-mail address to which Let's Encrypt will send expiration notices for the API's cert
notification_email = "[email protected]"
# CORS AllowOrigins, wildcards can be used
corsorigins = [
    "*"
]
# use HTTP header to get the client ip
use_header = false
# header name to pull the ip address / list of ip addresses from
header_name = "X-Forwarded-For"

[logconfig]
# logging level: "error", "warning", "info" or "debug"
loglevel = "debug"
# possible values: stdout, TODO file & integrations
logtype = "stdout"
# file path for logfile TODO
# logfile = "./acme-dns.log"
# format, either "json" or "text"
logformat = "text"
  • Line 45: I also enabled HTTPS for the ACME DNS API because it has integrated support for the ACME protocol client.

Issues to note when enabling HTTPS for ACME DNS API:

  • ACME DNS only supports the letsencrypt and letsencryptstaging API endpoints, which are hardcoded into the program. So if you want to replace letsencrypt or letsencryptstaging with a local Step-CA, you need to make some settings when running the container! And because Step-CA’s HTTPS is not trusted, you need to add Step-CA’s root certificate to the container. For specific methods, please refer to the article “Trusting Self-signed CA Certificates in Local Docker Environment”.

  • Add the domain names of letsencrypt and letsencryptstaging to the environment variable DOCKER_STEPCA_INIT_DNS_NAMES in the Step-CA container, so that when ACME DNS sends requests to Step-CA, they can communicate normally!

Below is the docker-compose.yaml I wrote according to the official Self-hosted documentation:

services:
  acme-dns:
    image: joohoi/acme-dns:latest
    labels:
      - traefik.enable=false
    restart: always
    volumes:
      - ./certs:/certs
      - ./database:/database
      - ./config:/etc/acme-dns:ro
    hostname: acme-dns
    networks:
      traefik:
        ipv4_address: 10.8.10.253
    extra_hosts:
      - ca.svc.dev:10.8.10.254
      - acme-v02.api.letsencrypt.org:10.8.10.254
      - acme-staging-v02.api.letsencrypt.org:10.8.10.254
    environment:
      - TZ=Asia/Shanghai
    container_name: acme-dns

networks:
  traefik:
    external: true
  • Lines 16~18: Forcibly direct the domain names of letsencrypt and letsencryptstaging to the internal Step-CA container IP. This step is very important!

After the ACME service is started, check if the API service has obtained a certificate from Step-CA. If everything is normal, test whether the API is working properly using the method below:

curl -X POST https://dns.svc.dev/register | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   235  100   235    0     0    768      0 --:--:-- --:--:-- --:--:--   767
{
  "username": "e1181993-6e69-4f4b-90f5-e33f383d5444",
  "password": "FUfLiaavn0e4ssrtJZbVt7FimNBgDvsEerRkkVPx",
  "fulldomain": "008a8c8a-d5a8-4ea6-964e-651f09220763.dns.svc.dev",
  "subdomain": "008a8c8a-d5a8-4ea6-964e-651f09220763",
  "allowfrom": []
}

After the API is working properly, we also need to configure Traefik’s docker-compose.yaml file:

services:
  traefik:
    ......
    networks:
      traefik:
        ipv4_address: 10.8.10.252
    command: 
      ......
      - --entrypoints.https.address=:443
      - --entryPoints.https.http3.advertisedport=443
      - --entryPoints.https.http.tls.certResolver=step-ca
      - --entryPoints.https.http.tls.domains[0].main=svc.dev
      - --entryPoints.https.http.tls.domains[0].sans=*.svc.dev

      - [email protected]
      - --certificatesresolvers.step-ca.acme.storage=/certs/acme.json
      - --certificatesresolvers.step-ca.acme.caserver=https://ca.svc.dev/acme/acme/directory
      - --certificatesresolvers.step-ca.acme.tlschallenge=false
      - --certificatesresolvers.step-ca.acme.dnschallenge=true
      - --certificatesresolvers.step-ca.acme.dnschallenge.provider=acme-dns
      - --certificatesresolvers.step-ca.acme.httpChallenge=false
    volumes:
      - step-ca:/step-ca:ro
      - ./certs/:/certs/:rw
      - ./config/:/etc/traefik/config/:ro
      - /var/run/docker.sock:/var/run/docker.sock
    extra_hosts:
      - ca.svc.dev:10.8.10.254
      - dns.svc.dev:10.8.10.253
    environment:
      - TZ=Asia/Shanghai
      - ACME_DNS_API_BASE=https://dns.svc.dev
      - ACME_DNS_STORAGE_PATH=/certs/lego-acme-dns-accounts.json

      - LEGO_CA_CERTIFICATES=/step-ca/certs/root_ca.crt
      - LEGO_DISABLE_CNAME_SUPPORT=false
    container_name: traefik

volumes:
  step-ca:
    name: step-ca
    external: true

networks:
  traefik:
    external: true
  • Line 14: Use acme-dns as the Provider for dnsChallenge
  • Lines 27~28: Configure the environment variables required by acme-dns

After completing the above configuration, recreate the Traefik container. If all goes well, a lego-acme-dns-accounts.json file will be created in the certs directory with the following structure:

{
    "FQDN": {
        "username": "e1181993-6e69-4f4b-90f5-e33f383d5444",
        "password": "FUfLiaavn0e4ssrtJZbVt7FimNBgDvsEerRkkVPx",
        "subdomian": "008a8c8a-d5a8-4ea6-964e-651f09220763",
        "fulldumain:": "008a8c8a-d5a8-4ea6-964e-651f09220763.dns.svc.dev",
        "allowfrom": ["IP"]
    }
}

Because we want to apply for a wildcard certificate for *.svc.dev, we need to configure a CNAME record for _acme-challenge.svc.dev in the general.records array of the ACME DNS configuration file:

records = [
    "_acme-challenge.svc.dev. CNAME 008a8c8a-d5a8-4ea6-964e-651f09220763.dns.svc.dev.",
]

Then restart ACME DNS and wait for Traefik and Step-CA to execute the entire dnsChallenge process…

Principle Analysis

The ACME DNS approach is itself a workaround, with the key being CNAME. Because the authoritative server does not support DNS API, the only option is to delegate the query request for _acme_challenge.tld. to ACME DNS through CNAME to handle and respond to the query. The general process is as follows:

  1. Traefik checks whether a certificate for the FQDN exists in certs/acme.json based on the FQDN defined in the route
  2. If the certificate does not exist, it checks whether an account for the FQDN exists in the certs/lego-acme-dns-accounts.json file
  3. If no ACME DNS Account exists, it calls the /register API to create one, and then writes the result to certs/lego-acme-dns-accounts.json
  4. It sends an application to Step-CA, and after obtaining the Token for verification, it creates a TXT record for 008a8c8a-d5a8-4ea6-964e-651f09220763.dns.svc.dev through the ACME DNS /update API for verification
  5. Step-CA initiates a query request to DNS, but because 008a8c8a-d5a8-4ea6-964e-651f09220763.dns.svc.dev is invisible to the CA, it will only verify and query whether the FQDN _acme-challenge.svc.dev has a corresponding TXT record

So before step 5, we need to manually add a CNAME record in ACME DNS, which is also the part that feels quite disjointed to me. If I need to apply for wildcard certificates for multiple second-level domain FQDNs, each one would require manual addition.

Unexpected Situation

If nothing unexpected happens, something unexpected will happen. Although this is a path that others have walked before, there is still a big pitfall. As for what the problem is, you can see the Issue I raised. As of the time of writing, I have not received an answer from the author!

So now I can only rely on myself…

The Birth of CDNS

ACME DNS looks good, but after actual experience, it was far from my expectations! After reading the source code, I modified some code to allow Step-CA to properly verify TXT, but this still cannot achieve fully automatic certificate issuance!

The reason is what I mentioned earlier: each FQDN needs to add a CNAME resolution record, and this cannot be automated! So I came up with an idea to develop a DNS specifically for internal network dnsChallenge.

Project address: betterde/cdns. Why is it called CDNS? Because it solves the dnsChallenge problem, so the C is an abbreviation for Challenge.

Through this project, it is possible to apply for and verify wildcard certificates for all TLDs! I will share the specific configuration and final effect in the next article, so stay tuned…

Conclusion

The ACME protocol seems perfect, but to achieve complete automation in an internal network, there are still many twists and turns. However, as infrastructure, once it is set up, the development experience will be extremely comfortable! At least in my workflow, I no longer need to manually generate wildcard certificates for each project, nor do I need to add certificates to Traefik’s configuration file!

I hope this is helpful, Happy hacking…