pwa.io

Building an all-in-one container to create PDFs with Browsershot

Ditch the bloat and harness the agility of a sleek Docker container, merging Puppeteer, Node.js, and alpine-chrome for lightning-fast web page screenshots and PDF conversions.

In the past few months, I've encountered several situations where I needed to capture web page screenshots or convert HTML templates into PDF files for printing. Fortunately, we now have the convenience of using a headless Chrome browser instead of dealing with tools like wkhtmltopdf. Although it requires significant storage and resources to run an entire browser, the benefits of using a modern browser and simplified debugging of rendering quirks outweigh the drawbacks.

Browsershot

One tool I've been using for this purpose is Spatie's Browsershot, a fantastic wrapper around Puppeteer, the Node.js library for controlling Chrome. It's incredibly user-friendly, especially during local development. Just run npm i -g puppeteer and you're good to go.

However, things change when you want to deploy Browsershot on your production server. As I've recently migrated all my side projects to Docker containers, I aim to keep them as small as possible. While it's technically feasible to install NodeJS, Puppeteer, and Chromium in my primary application container, it feels messy and would significantly increase its size by several hundred megabytes. Additionally, it contradicts the fundamental concept of Docker containers: having one service per container.

In my quest for an optimal Chromium container, I stumbled upon Zenika's alpine-chrom, which strives to be the smallest option available—excellent! Unfortunately, using this container as a remote instance for Browsershot still necessitates the installation of NodeJS and Puppeteer in my primary application container, which is not my preferred approach.

A new container

That's when I decided to construct a new container based on alpine-chrome. This container contains a simple NodeJS application that listens for Browsershot requests, forwards them to Puppeteer, and returns the results. By adopting this approach, I can maintain a compact main image and achieve clean separation.

Admittedly, in theory, it would be more ideal to create a separate container solely for the NodeJS component and leave the alpine-chrome container untouched. However, I opted for an all-in-one solution without the complexity of managing multiple containers. Besides, since Chromium is initiated and terminated by Puppeteer rather than running continuously as a service, it aligns with my "one service per container" rule.

This approach was heavily inspired by Stefan's sidecar-browsershot, which replaces the callBrowser method of Browsershot with an AWS Lambda function. In my case, I employed a similar strategy to call the NodeJS server in my new container.

To use this setup, simply start the container and instruct Browsershot to utilize it as the endpoint:

use pwaio\BrowsershotAio\BrowsershotAio;

BrowsershotAio::setEndpoint('http://chrome:3000');

// if you do not want to create a shared volume for your containers
$data = BrowsershotAio::url('https://example.com')->base64Screenshot();

// an image will be saved (needs a shared volume)
BrowsershotAio::url('https://example.com')->save($pathToImage);

// a pdf will be saved (needs a shared volume)
BrowsershotAio::url('https://example.com')->save('example.pdf');

You can find the source code and some usage details (though I wouldn't call it documentation) for browsershot-aio on GitHub.

(Note: I am currently using this container in one of my projects, and it works well. However, I cannot guarantee its compatibility in your specific setup. Also, please be aware that the Node server lacks authentication, so avoid exposing it to the public internet.)