Risky Thinking

Reflections on Vibe Coding

2025-07-15T00:00:00Z

One of my hobbies is playing with micro-controllers - low price single-chip computers. I follow an interesting YouTube channel on the subject. Recently the channel’s creator has used vibe coding to create example code. How well did it work?

My interest in with accurate time keeping probably dates from school days, when there were serious bragging rights to be earned from having an accurate watch which exactly matched the beeps of the Greenwich Time Signal on BBC Radio. Therefore, when a YouTube channel I followed showed how to create your own network time server using a cheap ESP32 micro-controller, I thought about building it. So I examined the AI generated example code.

I’ve tired to keep this at a high level, but to understand the problems you have to understand some of the details - and that is partly the point.

The code has to do two things:

Read the time from the GPS/GNSS module and sets the local clock to match it.
Read for an NTP query packet on the network interface, and constructs a reply which includes the time when the packet was received and the time when it was sent back. (Using these a client can calculate trip delays and decide how to adjust its clock to best match the server’s clock.)

The AI generated code:

Created a task which waited for a report from the GPS module, parsed it to get the time, waited in a “busy loop” for a variable to be set indicating a new second had started (see below), then set the local time.
Created a task which waited for an NTP packet on the network interface, constructed a reply including the time (from the local clock) when the packet was received, the time (from the local clock) when the packet was prepared, and sent the reply,
Created a mutex - a synchronization primitive that can be used by a program to prevent two tasks trying to read and write the same variables at the same time.
Created an interrupt handler which, every time the GPS module indicated the start of a new second, set a variable to true.

The code “works”, in that it compiles and would give a syntactically correct reply if sent an NTP packet. It looks plausible.

But the code is wrong. Very wrong. And looking out how it is wrong gives some indication of the dangers of “vibe” coding.

Although the code declares a mutex to prevent the two tasks from accessing the same variables at the same time, it isn’t ever used. Where is task synchronization needed? The shared variable isn’t in this code, it’s in the system time library. (This is possibly why the AI generated code misses it.) What happens if one task is setting the time and the other is reading the time simultaneously. It’s undefined. Will this happen often? No. It might show up in a long test, but will probably not show up in any simple test.
What happens on power up before the GPS module has received its first fix? The time will be (unless we add a real time clock) shortly after1 January 1970. That’s zero (uninitialized) time for time libraries. This time will be shared on the network until the GPS gets its first fix, possibly minutes later.
The GPS code waits for a variable to be set by the interrupt handler in a busy loop. There are a number of problems here:
- The variable is not declared (in C++) as volatile. This means that a compiler can assume its value is not being changed by another task and keep its value in a register rather than reading it from memory. So the time task might loop indefinitely depending upon the compiler being used and the options given to the compiler.
- Simple busy loops, where the code repeatedly tests whether a variable is true, are a bad thing: the processor is prevented from doing useful work while the task is looping. This may cause some unnecessary delays in the other task.
- There’s an off-by-one error in the timing. The task waits for the start-of-second variable to become true after reading the time from the GPS module, then uses the time previously read from the GPS module. This is the start of the next second, not the start of the second whose time was read.
- GPS signals are not perfect. They depend upon moving satellites and can be affected by nearby objects etc. A GPS receiver can lose lock, in which case the next start-of-second can be seconds (or minutes or hours) after the previous one. This results in the clock being set incorrectly until a new signal is received.
The interrupt service routine didn’t use the correct type of memory, and didn’t use the best method to communicate with the rest of the program. Micro-controllers have a confusing array of different types of memory, and code that is run in interrupt handlers should be placed in a specific type of memory. There’s also a specific primitive an interrupt handler should be using to communicate with a task (eliminates the busy wait loop), but that wasn’t used here.
There’s a questionable choice of hardware: the ESP32 natively uses WiFi - which isn’t really suitable for NTP time servers because there’s a significant amount of jitter (unpredictable delay) on a WiFi network due to other traffic or radio interference. I think this is acceptable in this case (cheap and widely available hardware for teaching purposes) but wouldn’t be a good idea in an area with multiple WiFi networks.

I think you get the idea.

Personal Tests

But perhaps, I thought, I’m being unfair to the human joint-author of this code. He doesn’t claim to be an expert programmer. I therefore experimented with some “vibe coding” myself using Google’s gemini-cli for comparison. I tried a number of simple Python coding tasks, from adding functionality to an existing script to generating a simple command line application in Python. The results were mixed:

The simple case of adding functionality worked excellently, except that I had to guide gemini as to where to some file format documentation: it’s general web search was inconclusive. There was a package it should have used to format the CSV file, but it’s own implementation worked in this instance.
The second attempt had to be abandoned: the program hallucinated what a documented function did and built an entire program around using it. When testing the resulting code it produced runtime errors, and a quick investigation showed that the documentation of the Python package it was using had been completely misunderstood. The assumption was a fundamental error, and as a result none of the code generated was remotely useful.
The third attempt produced incorrect code which could be modified . Once again gemini misinterpreted the specification it was using and chose the wrong values to use, this time giving plausible incorrect output. On the way it wasted a lot of its time trying to use XML namespaces (which weren’t used being used). It also decided to delete the database it was creating each time the program was run - an error which might have allowed test cases to work with serious consequences if ever used in production. In the end it I had to make several code correction by hand.

However, it was a great deal of fun and it felt as if it was productive…

To sum up…

The “vibe” code worked except:

It didn’t handle startup conditions correctly
It didn’t handle error conditions correctly.
It didn’t handle race conditions between different tasks.
It used undefined C++ behavior and in doing so made unwarranted assumptions about how the compiler would behave. A compiler change could cause the code to stop working.
It didn’t understand the specification of the GPS module it was using, resulting in an off-by-one-second error even when it worked as intended.
In general, it didn’t understand online documentation in sufficient depth.
It displayed a tendency to hallucinate functions or function arguments, and to assume it knew what the function did.
There was a complete lack of questioning the wider scope: in the GPS case, could this hardware choice ever give the required performance?

But in the GPS case the code did look plausible and would probably pass some simple tests. Apart from the off-by-one second error the faults would be unpredictable and intermittent. (i.e. It produced the type of faults which are most difficult and expensive to find and diagnose, especially after deployment.)

The code would probably fool an inexperienced programmer or someone who was unfamiliar with the hardware specifications. And that’s the problem. Does the person doing “vibe coding” actually know enough to know what they don’t know?

Resources

It’s easy to feel that “vibe coding” makes you more efficient: at least one randomized control trial suggests that the opposite is the case.
I don’t think it would be fair to mention the YouTube channel or GitHub repository containing the code here. The moral is to make sure you understand the code in depth before trusting it.

Stay up to date with the free Risky Thinking Newsletter.

Small Fire, Large Outage

2023-08-01T00:00:00Z

I tried to log on to our VoIP provider’s website today. The website wasn’t working properly. A small fire had ignited at the datacenter they used. It was quickly extinguished and the datacenter had redundant generators and backup systems. So why, more than twenty four hours later, was the website still down?

There are two business continuity plans at work here: the VoIP provider’s and the data center’s.

The VoIP Provider

From the the VoIP provider’s perspective this should have been a quick (if not automated) switch-over to a different data center with partial (if not full) functionality. It’s clear that the possibility of a complete data center failure was either (a) forgotten, (b) ignored, or (c) judged to be too costly to fully mitigate for the probability of it happening.

Whatever the reason, it doesn’t make them look good. And the fact that I only found out about their problems by encountering them myself rather than being warned in a helpful customer email or in a statement on their website does little to inspire confidence.

Even a rudimentary business continuity plan should include warning customers about service problems.

So the VoIP provider is handling the outage badly. They are cheap and full-featured, but I can no longer say that they are reliable, I won’t recommend them any more and will be looking for a suitable replacement.

But what of the data center? Why isn’t it back yet?

The Data Center

Fortunately (if you know the right place to look and the right discussion board to follow) you can find out that the data center they used actually cared about notifying its customers. Here’s what happened:

Power remains off at our data center in REDACTED per the local fire marshal.

We have had an electrical failure with one of our redundant UPS’ that started to smoke and then had a small fire in the UPS room. The fire department was dispatched and the fire was extinguished quickly. The fire department subsequently cut power to the entire data center and disabled our generators while they and the utility verify electrical system. We have been working with both the fire department and the utility to expedite this process.

We are currently waiting on the fire marshal and local utility to re-energize the site. We are completely dependent upon their inspection and approval. We are hoping to get an update that we can share in the next hour.

At the current time, the fire department is controlling access to the building and we will not be able to let customers in.

And this is what BCP plans often fail to take into account when considering the risk of a small fire. If it’s electrical, the priority of the fire department and the local electrical utility is safety. The fire may have been small. The fire may have been swiftly extinguished. But the fire marshal’s job is to ensure safety. That means shutting off all electrical systems and not taking any chances.

And when everything is shut down, and after repairs are made, it takes a surprisingly long amount of time to switch everything back on and get it working correctly. Here’s an update from twenty-four hours later:

The REDACTED data center remains powered down at this time per the fire marshal. We are continuing with our cleanup efforts into the evening and working overnight as we make progress towards our 9AM EDT meeting time with the fire marshal and electrical inspectors in order to reinstate power at the site.

Once we receive approval and utility is restored, we will turn up critical systems. This will take approximately 5 hours. After the critical systems are restored, we will be turning up the carriers and then will start to turn the servers back on.

The fire marshal has requested replacement of the smoke detectors in the affected area as well as a full site inspection of the fire life safety system prior to allowing customers to enter the facility. Assuming that all goes as planned, the earliest that clients will be allowed back into the site to work on their equipment would be late in the day Wednesday.

The points to note here are that:

After a fire, even a small one, you are no longer in control of your building: the Fire Marshal is.
The Fire Marshal’s priority is safety. Electrical systems will be switched off if there is any possibility that they present a hazard to firefighters or have been damaged in a fire.
Electrical systems have to cleaned, repaired and inspected before they can be re-energized.
Smoke detectors and other fire safety system components may need to be replaced and the entire fire safety system may need to be checked before normal access to the building is allowed.
After everything is completed, it can take a long time (in this case at least five hours) before critical systems are all working again.
Even after this, customer systems will still need to be restored.
In this case data loss was limited to that caused by systems being unexpectedly powered off. Hopefully this was taken into account in the design of all the application programs.

TL;DR? When planning remember that even small fires with limited damage can have major consequences.

Stay up to date with the free Risky Thinking Newsletter.

Phishing Airline Customers Made Easier

2023-02-28T00:00:00Z

Recently I received an email from Southwest Airlines. There was a reward of $100 for completing a survey. It was a straightforward phishing attempt, but Southwest made it easier than it should have been for the criminals. Are you making it easy too?

I received an email survey today. Not surprising. Many companies send them out. This one was from Southwest Airlines, and it offered me a reward of $100 for completing their survey.

Except it wasn’t. I knew this immediately as I have never flown on Southwest Airlines.

This was a phishing email directing me to a plausible sounding survey website. The fraudsters hadn’t bothered to get a free SSL certificate for their plausible domain, perhaps because this can trigger alerts when the Certificate Transparency List is published.

So should I warn Southwest Airlines? I went to their website. Was their an email address or form to fill in to let them know about the website “southwestairlinessurvey.today” phishing their customers? No there wasn’t. Plenty of ways to report lost baggage, but no obvious way to report an issue to their security team - assuming they have one. I could have found out how long their customer service queues were by calling their 1-800 number and found out whether their customer service agents knew how to contact their security team, but I didn’t.

I’m neither a customer nor a shareholder, so in a literal sense it’s none of my business. I will do things that are easy to do if it helps makes society better. But I won’t put in a lot of effort to help a company that doesn’t make it easy to help them. I suspect I’m not alone in this.

if you care about people phishing your customers make it easy to report it: otherwise you will only hear about it very much later from disgruntled customers who believe you cheated them out of rewards, goods, or services.

Stay up to date with the free Risky Thinking Newsletter.

Power Outages and Strategy 1.5

2023-02-14T00:00:00Z

An attack at the end of last year on two electrical substations put them out of action for a number of days. 40,000 customers were without power for a number of days. Such attacks are increasing. Do you need a better plan for dealing with power cuts?

On 3 December 2022 somebody attacked two electrical substations, putting them both out of action for a number of days. 40,000 customers were without power. The culprits had damaged parts which were difficult to repair or replace.

There were claims that the attack was sophisticated, that the attackers knew exactly what to hit with a high powered rifle to cause maximum disruption. How else to explain the amount and length of disruption caused? There were also claims the intention of the attackers was political: to disrupt a local drag festival.

But this is backward reasoning: these were the effects, therefore this must have been the intention. One plausible theory I read made the claim that this was no more than an attempt to cut off local power to burgle a store but with unfortunate collateral damage. Grady Hillhouse’s Practical Engineering YouTube channel has a good video describing the damage done and the work required to restore the grid.

What is certain is that such attacks are on the increase. In November 2022 the FBI warned of white supremacist plots to take down the US power grid, and that the information required to identify vulnerable substation components was being published by various groups. In 2014 a report by the Federal Energy Regulatory Commission warned that attacking just nine of the 55,000 US electrical substations could cause a national blackout.

Most businesses have some sort of plan for what to do when the power fails. The general assumption is that a power cut will be caused by bad weather and be of limited duration. The five day power cut in this case reminds us that power utilities only keep spares for common failure modes - not spares for all possible attacks. Large transformers (such as the one damaged here) are built to order and have long lead times, often over a year. If, as some have suggested, there is a real upward trend in attacks against equipment, the risk of multi-day power cuts is something that should be considered.

There are two basic power outage strategies which I’ve seen in business continuity plans:

Strategy 1: Install a backup generator so normal operations can continue without grid power. This is expensive, and only economic in some circumstances. Backup generators need to be regularly maintained and tested; even stored fuel requires maintenance.
Strategy 2: Accept the risk and send everybody home after an hour. Half a day’s lost productivity once every few years may be quite acceptable compared to the costs of providing backup power.

These strategies are commonly combined into a hybrid approach: keep critical functions (e.g. call centers, refrigerators and freezers) working with backup power, but send less critical or more power intensive departments (e.g. marketing, manufacturing) home.

Strategy 2 works well only if we can assume power cuts are rare and of limited duration. But what if, as in this case, they have the potential to last a number of days?

This is where Strategy 1.5 comes in.

Strategy 1.5: Accept the risk of a short duration outage, but mitigate the risk of a multi-day outage with provision for hooking up a mobile generator if needed.

A permanent backup generator may be too expensive to install and maintain, but provisioning for a mobile generator might not be. Truck or trailer mounted generators can supply up to 2MW at relatively short notice and be shipped long distances as needed. I’ve seen this strategy work well: mobile generator provisioning originally installed to mitigate Y2K risks was used years later to limit disruption during a multi-day power outage.

Does Strategy 1.5 make sense for you? It’s worth investigating the costs and doing a few rough calculations to find out.

Stay up to date with the free Risky Thinking Newsletter.

Santa's Ransomware Incident

2023-01-31T00:00:00Z

The recent ransomware incident at the North Pole was kept pretty quiet. In our exclusive interview Santa tells us what happened, and the unusual steps his organization was able to take after the incident occurred.

I’d arranged to meet Santa at the North Pole Pub. Most customers had left and the bar was almost empty. He was sitting at his usual table in the corner by the door, an assortment of bottles and glasses in front of him.

“Ho, ho, ho” he greeted me, but with little enthusiasm. His suit was dirty and crumpled, and even his white beard was looking a pale shade of grey.

“Tough Christmas?” I asked.

He looked down at his half-empty glass, as if the answer might be there. “The worst”, he replied. “Very, very stressful”.

And then he told me what had happened.

“It was ransomware. Bloody ransomware. We’ve been hit by it before and no doubt we’ll be hit by it again. But this time it was particularly bad.”

He looked up from his glass and looked me straight in the eye. “As you know, we’ve carefully segmented and firewalled all our networks. We don’t accept emails, which limits our exposure to phishing attacks. All our elves are well-trained - thanks to your Plan424 system each of them knows exactly what to do in any emergency. And when somebody attacks us, they generally go after what they think is our most important asset. The List.”

“The List isn’t the most critical asset?” I hadn’t given much thought to Santa’s IT assets before.

“Not even close. Outsiders think the Naughty and Nice list is our most important asset, but it isn’t. Yes, we keep it carefully backed up. But it doesn’t change that often. If we were forced to re-use last year’s list I doubt anybody would notice. I could offer The List for sale on the dark web nobody would buy it. You could even publish it on the Risky Thinking website and everybody would just say it was a fake list of names. It’s really not that critical.”

“So what did they go after?” I asked. “If you tell me what happened maybe our readers can avoid similar problems.”

He paused, and I could tell from his distant expression that he was deciding carefully just how much it was safe to tell me. Then he replied.

“Well, there are quite a few systems that are more important than The List. There’s the intelligence gathering system our informants use to tell us who is naughty and who is nice. It would be embarrassing if people found out too much about how that worked. But - like the list - it’s not particularly time critical. Then there’s the ERM system: organizing the worldwide production of millions of presents is hard. That would normally be critical, but by Christmas Eve production is long finished. And of course there’s the payroll system: our elves may like working here but they also like to be paid.”

Santa took a another long sip of his beer. “But this was a Bad Elf situation. They knew exactly which system to go for.”

“A Bad Elf situation? You mean an insider attack?”

“Precisely. There’s one system we can’t do without on Christmas Eve. The one system they knew we couldn’t do without. The SSS. The Sleigh Scheduling System. There’s no way we can deliver presents all over the world in an unbelievably short period of time without it. Insiders know what hurts the most. And our insider knew that at Christmas this was the most critical of all systems.”

“So what did you do?”

“Well we knew from the start it was an inside job. We employ strict least-privilege principles, and our systems are carefully air-gapped and firewalled. So we knew it had to be an elf who worked on the Sleigh Scheduling System. We also knew that whatever we did with our regular staff might be leaked to the extortioners, so we moved our negotiations to a team of outside professionals. We delayed as much as we could, but the extortioners knew we had a deadline which we couldn’t afford to miss. If we didn’t deliver by Christmas Day, nobody would ever believe in us again. So we negotiated the amount down as much as we could and then paid the ransom. We had no other choice.”

“Did you have insurance? I know that some companies have insurance for this sort of thing.”

“Cyber-insurance for ransomware is increasingly expensive. And insurers regard the North Pole as a very high risk location. So we had chosen to accept the risk ourselves.”

“So you payed them?”

“We didn’t have any choice at the time. But we do have a couple of advantages that the extortioners hadn’t reckoned with. Advantages that other organizations don’t have. We have the best intelligence system on the planet. We know exactly who is naughty or nice and why. So it didn’t take us long to identify the criminals we were dealing with. And after a discreet discussion with one of our bigger elves they decided to give the ransom back. It seems that nobody wants their children and their children’s children put on the Naughty List. Criminals just don’t think of that when they try to threaten us.”

“And the Bad Elf?”

He smiled. This was safer territory. “We moved him to where he couldn’t do any harm. We also got him into rehab. There’s a lot of gambling at the North Pole, and he was desperately trying to pay off his debts. That made him an easy mark for cybercriminals. In other circumstances he would have been a Good Elf, but with gambling debts piling up the temptation was just too much.”

With that Santa drained his pint and stood up. He seemed happier now that he had told someone what had happened. I offered to buy him another drink, but he declined… “Can’t be seen drunk in charge of a sleigh” he grinned as he pulled open the door and disappeared into the frigid polar night.

Stay up to date with the free Risky Thinking Newsletter.