The Spectre of Meltdown

I’m seeing so much “OMG, the Earth is doomed!” noise about Meltdown and Spectre, the recently-revealed Intel bugs, I just couldn’t resist adding my own.

I know some of you have managed to miss the fuss so far, so here’s a quick rundown of the problem: All Intel CPUs and some other manufacturers’ chips are vulnerable to one or both of a pair of issues that were just discovered recently. That includes the Apple-designed chips in iPhones and iPads; many of the CPUs in Android phones; some, if not all, AMD CPUs; and every Intel processor from the Pentium* on.

* I find it ironic that the bug dates back to the Pentium. Turns out that chip’s early inability to do division was the least of its problems.

Both bugs are related to something called “speculative execution”. The brief explanation is that in order to give faster results, CPUs are designed to guess what work they’ll have to do next and work on it when they would otherwise be idle. If they guess right–and a huge number of engineering hours have gone into establishing how to guess and how far ahead to work–the results are already there when they’re needed. If not, the wrong guesses are thrown away.

The details are way too deep for this blog, but the upshot is that because the bugs are in the hardware, there isn’t any perfect fix possible. Meltdown can be patched around, but Spectre is so closely tied into the design of the chips, that it can’t realistically be patched at all. It’s going to require complete hardware redesigns, and that’s not going to come soon. I’ve seen articles speculating that it could be five years before we see Intel CPUs completely immune to Spectre.

Personally, I suspect that’s insanely pessimistic. Yes, it’s a major architecture change, but Intel’s motivation is huge.

More worrisome is how many other hardware bugs are going to turn up, now that researchers are looking for them. Even if we get Spectre-free Intel chips this year–which is as optimistic as five years is pessimistic–the odds are overwhelmingly good we’ll see more such bugs discovered before the Spectre fix rolls out.

It’s also worth noting that the patches for Meltdown aren’t cost-free. According to Intel, depending on what kinds of things you do, you could see your computer running anywhere from five to thirty percent slower. Let’s be blunt here: if you mostly use your computer for email, looking at pictures, and web surfing, you’re not going to notice a five percent drop. You might not even notice thirty percent–but your workload isn’t going to be the kind that has a thirty percent slowdown*. The people who will get the bigger hits are the ones doing work that already stress their CPUs: video processing, crunching big databases, serving millions of web pages, and so on.

* Unless some website hijacks your computer to mine cryptocurrency. But if that happens, you’d notice your computer slow down anyway.

So the bottom line here: Eventually, replacing your computer will be a good idea, but we’re not there yet. (And yes, given the speed and power increases we’re going to see between now and then, even if it’s possible to just upgrade the CPU, it’ll probably make more sense to replace the whole computer.) And in the meantime, unless you’re running a big server, do what you’ve been doing all along: keep your OS up to date with all the vendor patches, don’t run programs from untrusted sources, and if your search engine tells you a web site is dangerous, don’t go there!

On Proper Bug Reporting

A post for those of you readers who work in QA, especially those of you at my former place of employment who know my usual complaints about improperly logged bugs.

A recent bug report against the Chrome browser has been getting some press on the Internet. In case you’ve missed it, I’ll reproduce it below. You can see the original report and the resulting discussion in the Chromium bug tracker. The comments below the report were my first reactions on seeing it.

Issue 224182: Chrome wakes me up in the middle of the night, with monsters.

Chrome Version: 25.0.1364.172 m

What steps will reproduce the problem?

1. Before going to bed, enjoy the wonderful goodness that is watching an episode of Supernatural on Netflix. Some people might not like creepy TV shows before bed, but what can I say, I can't get enough of that show!

2. After the episode ends, turn off your monitor, crawl directly into bed, close your eyes, relax, think happy thoughts, and drift off into a nice peaceful sleep.

3. Around 3 AM when you're just sinking into the depths of your second full sleep cycle, perhaps Windows would like to install some Windows Updates and reboot your computer. This is of course no concern to you as you're fast asleep.


5. After getting out of bed and changing your pants, realize that after your computer restarted, Chrome helpfully re-opened all of your tabs, including Netflix, and so it restarted playing the episode of Supernatural that you watched before bed.

What is the expected result?
Ideally when Chrome restores your browsing state, it would also restore the state of plugins (e.g. my netflix player wasn't actually playing when Chrome shut down, so it shouldn't start playing when Chrome starts back up). This might require API changes to Silverlight / Flash so that Chrome can trigger them to save / restore state, so I don't know if that's feasible, but it would be nice!

Barring that, it would be nice if Chrome didn't run any plugins when restoring your tabs and instead displayed a bar at the top of the window, saying something like "Plugins were prevented from running while restoring tabs. [Start Plugins Now]"

I’d like to address a couple of problems with this report. Note: I won’t go into detail on items that may be specific to certain locations bug standards, such as the failure to provide a link to relevant documentation or attach error logs, screenshots, or (most relevant in this case), webcam captures.

First, remember that the steps to reproduce should be as concise as possible and include only the minimum number of actions necessary to allow the bug to be recreated. This report contains several unnecessary steps and does not correctly indicate all required actions:

  • Turning off the monitor and thinking happy thoughts are not required to arrive at a nice, peaceful sleep.
  • Windows restarting is a required step, yet the report does not make this clear. It suggests that “perhaps” Windows could install updates and restart. Eliminate the “perhaps” and make it clear that other causes of Windows restarting will also trigger this issue.
  • Changing your pants is not required to reproduce the bug. Also, keep in mind that not everyone wears pants when sleeping, and in fact, those who don’t wear pants will rate the severity of the bug higher, as they will need to change the sheets and perhaps the mattress as well, depending on how often the bug occurs.

Second, Step 5 is not actually a step to reproduce, it is the actual result. This should be more clearly delineated.

Third, the expected result is incorrect. The report states the reporter’s desired behaviour, rather than the behaviour defined in the appropriate requirements document.

Finally, the most important point is that this is not actually a bug and should be closed as “As Designed”. Had the user properly considered the development of the Internet as a communication tool, the history of Windows as an OS expected to restart frequently, and the intent of Chrome’s design, he would have realized that the sole purpose of the last 50 years of technical innovation was to scare the shit out of him.