AtomicJar: shift integration testing left and get rid of staging
When I first joined Datadog, I had a lot of people ask me why I was so excited. I would proudly explain that "people want traces, metrics, and logs in a single platform" or "monitoring and ITSM are broken in cloud-native environments" and ... it mostly wouldn't click. Turns out, you can't just make bold claims and expect people to believe you unless you're ready to argue why.
In related news, several people have asked me why I'm so excited about joining AtomicJar.
It's all about "shifting integration tests left", "shortening dev cycles" and "getting rid of broken staging environments". But don't just take my word for it, read on!
Dependencies-as-code with Testcontainers
Sergei, Eli, and the team at AtomicJar are focused on growing the community around the Testcontainers open-source project (for 8 languages such as Java) and building the Testcontainers Cloud SaaS product that lets anyone seamlessly run their tests in the cloud.
Anyone who's written integration tests knows the never-ending frustration of maintaining a local test environment with k8s + postgres + kafka + etc. If you've tried packaging a web browser like selenium to automate UI tests, you've likely felt "step on lego" levels of pain. Testcontainers are throwaway containers for dependencies that you invoke as regular code. For example, the following 2 lines of Java get you an ephemeral redis container for your tests. Neat!
@Container public GenericContainer redis = new GenericContainer("redis:5.0.3-alpine").withExposedPorts(6379);
Painless test execution, from laptop to CI
Of course, writing integration tests is only half the challenge, now you want to run them on your laptop and within your CI. Both are frustrating:
You start a bunch of containers on your machine and one of two things happens.
- You have the new M1/M2 Macbook. Good for you! Except some containers don't play well with your fancy ARM-based CPU. Maybe you can emulate them, and they're kinda slow, maybe not. Fun times!
- A couple docker containers start, the CPU begins cooking your legs, the fan accelerates until the laptop takes off: you decide to get another coffee while the tests run. As always there's a relevant xkcd.
You want your integration tests to run on each PR that hits your CI environment? That's not particularly pleasant either. Running docker-in-docker ("dind") is considered an anti-pattern and a source of heisenbugs. It also takes significant toil to orchestrate beefy CI machines to keep tests snappy. Finally, your CI and local tests are just different enough that some of them are "flaky" and random PRs test red for no clear reason.
There's got to be a better way! Well yes, that's where Testcontainers Cloud comes in. One click and all your Testcontainers-based tests run remotely. It feels exactly the same, except your tests are much faster and your laptop remains responsive. You don't need to rewrite anything. In fact you don't even need to run a local Docker environment! It's a massive boon to developer productivity both locally and in the CI.
Okay, but Marc, I didn't ask what the product does! I asked why you're so excited!
Betting on simple has never looked better
Right! Why does any of this matter? Integration tests are certainly important to ship reliable code, but so are canaries and a dozen other DevOps practices. Incidentally, have you tried writing No Code? What's the game-changer with AtomicJar?
If you're like me, you remember when Heroku came out and anyone suddenly felt like they could build a Rails app and scale it into a multi-million dollar business as a one-person team. It was incredibly empowering and hundreds of successful SaaS companies we now take for granted were born in the Great Vibe Shift of 2010.
Over the following 10+ years, this feeling has slowly faded as the pendulum swung towards specialized layers. The average stack now has a React SPA talking to a go micro-services backend orchestrated on kubernetes. There's probably also a proprietary serverless offering at the edge, kafka and graphQL in the middle, and a managed database at the end. Don't get me wrong, this has led to significant across-the-board improvements in productivity, latency, stability, scalability, etc. The complexity increase was worth it.
As of 2022 we've reached peak complexity and specialization, and the pendulum is swinging back. People are looking to retain the productivity and performance gains while radically simplifying the stack and how they ship code for it. And so, there are obvious bets for the next decade, such as:
- You want full VMs at the edge with Heroku-like UX? Have a look at Fly.io.
- VPN, PAM, and cross-region networking are super complex? Take Tailscale for a spin.
- Testing cloud-native apps is too damn hard and staging keeps breaking? AtomicJar is on it.
The missing factors, or why staging is always broken
Back to Heroku. The platform worked well, but their Twelve-Factor App manifesto was arguably a bigger contributor to the wave of successful SaaS products that followed. First published in 2011, the methodology identified 12 factors that provided devs and ops with a solid playbook to architect scalable SaaS apps. Over the years, most of these factors became de facto standards. It's now rare to see database migrations handled manually; instead you write a script, commit it, and then run it. That's factor XII. Run admin/management tasks as one-off processes. But not all 12 factors came to pass. For some, the technology simply wasn't there.
Two principles in particular have either never seen full adoption, or have regressed as a result of increased complexity in cloud-native environments: IV. Treat backing services as attached resources and X. Keep development, staging, and production as similar as possible. Taken together, these principles underpin continuous deployment by keeping the time lag between dev and prod short. Your app "makes no distinction between local and third party services" by treating any backing service as an attached resource. So other microservices and 3rd-party dependencies now look the same to your code: they're a REST or gRPC call away. And because you "resist the urge to use different backing services between development and production", you can confidently test your code locally and trust that it will run as intended in production, even with regards to interactions that span several microservices and backing services.
But of course it was never that easy! Think about emulating a GCP service locally or spinning up a k8s cluster on your Macbook with minikube. What about that in-memory database that's used only for testing? It all sort of works, but do you trust that your laptop and prod behave similarly enough that you can safely deploy? In my experience, that's not the case. Writing and running reliable integration tests have remained too hard.
Indeed, even mature engineering teams often painstakingly maintain a staging environment that's really used for integration and QA. You push to staging and check what breaks with your observability tools and your eyes. With each not-quite-prod-ready PR pushed, staging keps drifting away from production or flat out braking. Then someone has to reset it. It's a massive bottleneck, it's usually expensive, and it's not even a reliable mirror to prod.
Shifting integration testing left to get rid of staging
The promise of AtomicJar is that integration tests now run locally, quickly, and with high fidelity. In turn, everyone writes and runs better integration tests as part of their regular dev loop. Ultimately, by "shifting left" integration testing (while adopting other practices such as canaries), teams get rid of their broken staging environment, gaining significant velocity and stability in the process. Testcontainers and the associated cloud offering deliver on a key promise of productivity and reliability made over a decade ago.
So, yeah, I'm pretty excited! :)
PS: If you agree, then check out the beta for Testcontainers Cloud or our open positions. If you think I'm wrong, please email me! I'd love to hear your arguments!