Nerd of the Month #5: A Castle of Glass
Or, rebuilding the Lightning Network's Foundations.
Whoops, we skipped July. It’s summer, everybody’s out—whatever. Anyway, we’re making up for it by delivering a longer-than-usual post by Spiral grantee Matt Morehouse, who is working on improving Lightning’s security and stability.
Prefer to listen to this post? Head to the read-aloud version on Apple Podcasts.
The Lightning Network is one of the most exciting elements of bitcoin research and development, incorporating diverse technologies to bring users fast, private, and cheap payments. This is done by Lightning nodes establishing confidential communications via the Noise Protocol Framework, which privately route payments using Sphinx-based onion packets and maintain on-chain security through hash time-locked contracts (HTLCs). A constant stream of exotic new features is being implemented, from distributed data storage to arbitrary onion messaging and non-bitcoin assets.
All this excitement has attracted huge investments. Since 2018, more than half a billion dollars have been raised by Lightning startups from larger companies investing millions into Lightning development every year.
But what, exactly, has all this investment produced? A castle of glass. It is magnificent to behold, yet it is dangerously fragile. When the storm comes, it won’t be in one piece for long.
The recent wave of security disclosures reveals a worrying reality: the very foundations of the Lightning Network are brittle. These are not isolated bugs but symptoms of a systemic "features-first" development culture. For the Lightning Network to survive and thrive, its architects—from individual developers to major corporations—must shift their focus from building new features to securing the ones we already have. We must move from a features-first to a security-first mindset.
The Debt of "Move Fast and Break Things"
In the rush to innovate and "move fast,” things break and technical debt builds. Two recent vulnerabilities serve as perfect examples:
A denial-of-service (DoS) vulnerability in CLN involved a race condition that could be exploited to crash a node. This vulnerability was particularly alarming because a previous update knowingly introduced the underlying race condition to rush out support for multi-channel operations.
The `gossip_timestamp_filter` DoS vulnerability in LND, present since 2018, allows an attacker to easily crash a node by forcing it to consume excessive memory. This critical flaw was introduced in a large pull request that received minimal security review.
In both cases, the desire to quickly ship new features led to critical, long-lasting security oversights. The desire to build higher created cracks in the foundation.
Wanted: An Adversarial Mindset
Many of the most dangerous bugs in Lightning have been shockingly simple. The `gossip_timestamp_filter` DoS vulnerability mentioned above was not found as part of a detailed security audit; it was discovered accidentally while reading the code. An attacker only needed to send a few malicious messages to take a node offline.
Other simple and trivial-to-exploit vulnerabilities include:
2021 LND vulnerability, reported by Niklas Gögge, that allowed an attacker to crash a node simply by spamming it with `channel_update` messages.
2022 vulnerability affecting all major Lightning implementations triggered by repeatedly opening fake channels, degrading performance, and putting funds at risk.
These vulnerabilities weren't hidden in complex cryptographic logic but in plain sight. They slipped through because developers tend to assume users will behave as expected. This has to change. As developers, we must think like attackers and bake that adversarial mindset into our daily workflows. We must ask questions like:
How could this feature be abused?
What is the worst-case scenario if this code fails?
What core assumptions could an attacker violate here?
What's the worst-case resource utilization of this code?
This isn't about slowing innovation; it's about making it resilient and sustainable for the long term.
Security's Unsung Heroes: Readability and Comments
Security isn't just about spotting flaws; it's also about writing code that leaves nowhere for flaws to hide. Clear, readable, and well-documented code is secure code. Bugs can fester for years when logic is convoluted or reasoning is left undocumented.
Two recent vulnerabilities in LDK illustrate this perfectly:
The subtle invalid claims bug could be exploited to either lock up the victim's funds or steal them outright. The complexity and lack of clarity in the surrounding code obscured the bug.
The duplicate HTLC failback bug could be exploited to force close all of a victim's channels. The bug was introduced seemingly because the reasoning for the original safe behavior was never properly documented.
A single, well-placed comment can differentiate between a secure implementation and a costly exploit. Prioritizing code clarity isn't just good practice; it's a fundamental security principle.
The First Rule of Security: Don't Trust, Verify
A foundational error in security is failing to validate untrusted external inputs. The infamous "onion bomb" vulnerability in LND is a textbook case. An attacker could craft a malicious onion packet with an oversized length field. When the victim's node tried to process this packet, it would attempt a massive memory allocation and instantly crash. The attack was cheap, anonymous, and devastatingly effective. The cause? A single missing check on an input field.
Fuzz testing, a technique that automatically bombards interfaces with random data to find crashes, is an incredibly powerful tool for finding these bugs. Fuzz testing was used to find the onion bomb vulnerability and several recent invoice-parsing bugs in CLN. And fuzz tests are often easier to write than unit tests, so there is little excuse *not* to write them. At a minimum, every API consuming untrusted input data must be fuzz tested.
From Glass Castle to Fortress
Vulnerabilities like those described here are an existential threat to the Lightning Network. The onion bomb or `gossip_timestamp_filter` vulnerabilities could have taken over 90% of public routing nodes offline, allowing attackers to steal funds while the network was helpless. LDK's invalid claims bug or LND's excessive failback bug could have been used to silently siphon millions from large liquidity providers.
Many of these bugs were not hard to find; any script kiddie could have stumbled across the `gossip_timestamp_filter` or `channel_update` DoS vulnerabilities. An LLM probably could have one-shotted the attack programs for them.
Today's Lightning Network truly is a castle of glass, and unless we start rebuilding it now to withstand attack, it will eventually be shattered. LSPs and large routing node operators will go out of business when their funds are stolen. Users will lose all trust when they're randomly unable to send or receive payments for days at a time. Without any users, business models relying on Lightning will fail. Bitcoin adoption as a whole will take a major hit.
The Path Forward
This is a solvable problem, but it requires a cultural shift.
For Businesses: If you rely on a Lightning implementation, you must invest in its security. Assign dedicated engineers to audit the code you depend on and review new changes. Your business's survival may depend on it.
For Maintainers and Developers: We must champion a security-first engineering culture. This is more than just a checklist; it's a complete mindset shift.
Here is what that looks like in practice:
Prioritize Security Reviews: Embed adversarial thinking into every code review. Hire dedicated security engineers to hunt for flaws proactively.
Test Everything, Then Fuzz It: Implement comprehensive test suites for every bug fix. Aggressively fuzz test all code that handles untrusted input or manages complex state.
Invest in Simplicity: Write clear, readable, well-documented code. Complexity is the enemy of security.
Build for Resilience: Design systems that protect user funds even under a DoS attack. Prioritize protocol changes that simplify implementation, rather than adding to its complexity.
Feature development will slow down in the short term, but by investing in security now, we are ensuring that the Lightning Network will be resilient for years to come. We can take the medicine now or learn by waiting for the castle to shatter.



