After moving my personal website to my home DMZ network over a public cloud reverse proxy, I have increased my self-hosted footprint. Given how well cheap VPSs had performed for me in the past, I knew that some spare small-form-factor Lenovo desktops with more than one core and literally an order of magnitude more memory would do just fine for the same applications I had in the public cloud. Increasing this hardware footprint meant that I had an excuse to revisit an old friend that was a big part of my past role at RELEX: Apache ZooKeeper.
ZooKeeper is a key-value store for managing the state of distributed applications. The project started at Yahoo!, where many engineering teams working on distributed applications ended up duplicating effort in solving the same problems while introducing the same failure modes to their applications. What became ZooKeeper was a common solution to allow engineers to focus more on business logic and less on reading distributed systems academic research. The service is generally run as an ensemble of 2N - 1 servers where N>2, and two motivations for this configuration are for fault tolerance and master election. Only a single master node in the ZooKeeper ensemble is capable of writing to the data store, and only once a write is recognised by a majority of the cluster members including the master is the write committed. This allows the data store to remain operational provided that a majority of the cluster is available. On ensemble startup, or if the master ZooKeeper server becomes unavailable, the ZooKeeper cluster elects a new master. An odd number of nodes prevents a 50/50 deadlocked master election in these cases. 1
While ZooKeeper is normally used for more interesting things, I decided to use it for service discovery and load
balancing over the replicas serving the test.iainschmitt.com. static website.
On startup, every replica writes to a /targets/$hostName znode; znodes being a node in the ZooKeeper data store.
ZooKeeper supports both nodes that will persist until explicitly deleted as well as ephemeral ones that will be deleted
once the client that created them in the first place is disconnected from the ZooKeeper ensemble. By using ephemeral
lifetimes for replica znodes, unreachable target replicas would be removed from consideration by the reverse proxy.
When an uncached request for a particular URL reaches the reverse proxy, it lists the children of /targets to
determine potential reverse proxy targets. The value stored in the /targets/$hostName znode is the count of cumulative
requests to that target, so the target with the fewest requests is select and it's count incremented if the connection
succeeds. If the first attempted target fails to respond, the next least commonly used is attempted. The request cache
is cleared whenever a new replica comes online, which would most commonly happen during an application update.
By setting things up this way there weren't many changes that needed to happen with the portfolio website itself, almost
all the new code was for the reverse proxy itself, which I wrote using Node. There would have been better choices as far
as ZooKeeper support goes; while the zookeeper NPM package is actively maintained it falls back on more
Promise<any[]> type definitions than preferable, but that may have to do with the native C libraries that the client
is built with.2 I can't say I've done a comprehensive side-by-side comparison, but the official ZooKeeper
client written by the project team looks to be a lot more complete.
Having an 'outer' request made to the reverse proxy as well as an 'inner' request made by the reverse proxy to the target is something that I didn't do correctly at first as shown by this toy example:
import { createServer, IncomingMessage, ServerResponse } from "node:http";
import http from "node:http";
createServer((req: IncomingMessage, res: ServerResponse) => {
const options = {
hostname: "127.0.0.1",
port: 4000,
method: "GET",
path: req.url,
};
const proxyReq = http.request(options, (proxyRes) => {
proxyRes.on("data", (chunk) => {
res.writeHead(200, { "Content-Type": "text/plain" });
res.end(chunk);
});
proxyRes.on("end", () => {
proxyReq.end();
res.end();
});
});
proxyReq.on("error", (e) => {
res.writeHead(500);
res.end(e);
});
}).listen(5001);
When the previous server was run, the inner proxyRes handler for 'data' events was never called so it didn't
function as an actual reverse proxy. I must have skipped this line in the Node docs the first time: 3
In the example
req.end()was called. Withhttp.request()one must always callreq.end()to signify the end of the request - even if there is no data being written to the request body.
After calling end, a readable stream of the request to the target server must be handled using event listeners. This
readable stream also can emit multiple 'data' events, so; a working reverse proxy looks something like the following:
createServer(async (outerReq: IncomingMessage, outerRes: ServerResponse) => {
const proxyReq = http.request({
hostname: "localhost",
port: 4000,
method: "GET",
path: "/",
});
proxyReq.end();
proxyReq.on("response", (proxyRes) => {
outerRes.writeHead(proxyRes.statusCode || 200, outerReq.headers);
proxyRes.setEncoding("utf-8");
const chunks: string[] = [];
proxyRes.on("data", (chunk) => {
chunks.push(chunk);
});
proxyRes.on("end", async () => {
const body = chunks.join("");
outerRes.write(body);
outerRes.end();
});
});
proxyReq.on("error", async () => {
console.error("Error");
});
}).listen(5000);
I hadn't placed much focus into the Node networking APIs before, and until reading the Node chapter of JavaScript: The
Definitive Guide I didn't have that great of a handle on it. 4 This was a book that I already had a great
deal of respect for given its comprehensive detail, so I wasn't surprised that the Node chapter also was quite well
written. The way that readable and writable streams work is relatively intuitive, but the Node documentation would be
better if it included TypeScript types and was clearer about which events can be emitted by which readable streams. It
seems strange that strings are used to represent arbitrary events. While there are readable stream method type
signatures like on(event: "data", listener: (chunk: any) => void): this there is also a permissive
on(event: string | symbol, listener: (...args: any[]) => void): this to support custom EventEmitter instances.
5
Event emitting is a pretty unique flow control primitive which I haven't seen an equivalent of in other languages.
Because of this, one early mistake that I made with them was trying to catch errors by wrapping the entire
createServer in a try/catch, but this of course does nothing to handle 'error' events. A more embarrassing
moment was when my reverse proxy was failing a
local load test,
at which point I captured a
flame graph that pointed out
something that I should have seen from reading my ZooKeeper enabled reverse proxy more carefully: I was reading from
ZooKeeper regardless of if I had a response cached. After fixing this, the load test passed. This reminds me of what it
is like to overuse interactive debuggers: often times they'll tell you exactly what you would have figured out if you
simply read the code more methodically; a lot of debugging ultimately boils down to reading comprehension.
This ZooKeeper enabled reverse proxy currently serves test.iainschmitt.com
across two replicas, this is naturally ridiculous overkill and there are dozens of out-of-the-box solutions to do this
better. But given that this isn't for production use, that attitude is
no fun. With that said, one less obvious way that this is
an absurd solution is that ZooKeeper was designed for distributed system workloads with more reads than writes, but
right now the /targets znode is queried before updating the cumulative request count of the chosen target server,
making for a 1:1 ratio between reads and writes. 6 Right now I'm operating a single reverse proxy
server, a single ZooKeeper in the ensemble, and both target server replicas on the same physical host, but that's just a
little bit of system administration away from being fixed.
Apache ZooKeeper provides a few convinces for notifying clients about changes in the data store. ZooKeeper watches are
one of these features, and I put them to use for clearing the reverse proxy cache when a new replica becomes available.
A target server will write the current date time to /cacheAge during startup, and the reverse proxy calls the function
below to clear the request cache accordingly. Because watches only last for a single change notification, in the code
below I have reset the watch every time it is triggered but there really has to be a more elegant and error-resilient
way to do this.
export const cacheResetWatch = async (client: ZooKeeper, path: string, cache: NodeCache) => {
if ((await getMaybeZnode(client, path)).isSome()) {
client.aw_get(
path,
(_type, _state, _path) => {
console.log("Clearing cache");
cache.close();
cacheResetWatch(client, path, cache);
},
(_rc, _error, _stat, _data) => {},
);
}
};
The caching logic in general needs work; the cache TTL is 120 seconds and if the current target server git commit was recorded in ZooKeeper rather than the timestamp of the last replica restart then the cache could be cleared only when the content of the target server has actually changed.
Since leaving RELEX, I've heard many complaints like the following about my favourite fault-tolerant key-value store, such as on episode #116 of the Ship It! Dev Ops podcast: 7
The worst outage I ever had is I was at Elastic, an engineering all hands in Berlin. It was a great place. I loved it. So all the SREs were there. And we did this to ourselves. Let me just preface this by saying… Because we relied on something that you should never rely on, and it’s called Zookeeper.
Half of the gray in this beard is from Zookeeper. So many things that you know, and probably love, and also hate… You probably love it if you don’t have to actually do the operations for Zookeeper, and if you’re on operations with Zookeeper, you absolutely hate Zookeeper. Zookeeper is the bane of your infrastructure, necessary as it may be.
As someone who was on the operations side of ZooKeeper, I have to disagree. But given that I went to the effort to shoehorn into a static site server, of course I disagree.
Benjamin Reed and Flavio Junqueria. ZooKeeper. O'Reily Media, Sebastopol, CA.↩︎
David Flanagan. JavaScript: The Definitive Guide (7th. ed). O'Reily Media, Sebastopol, CA.↩︎
Hunt, P., Konar, M., Junqueria, F. P., and Reed, B. "ZooKeeper: Wait-free Coordination for Internet-Scale Systems", in Usenix ATC, June 2010.↩︎