Posts Tagged

software bug

Mars Polar Lander: A Different Bug!


Distances in space can seem unfathomable. Each planet comes with a different set of mathematical challenges. The gravity of Jupiter is so huge that it’s slowly tearing its moons apart via tidal forces, for example, while Mars is so small it’s barely holding onto its own atmosphere.

Even if they were in the exact same spot, getting a probe to the moon is going to be wildly different than getting one to the atmosphere of Mars. Orbital physics are so incredibly complicated that it’s a miracle anything makes it anywhere at all! And yet, we’ve landed things on Mars, on the moon, on Venus, on asteroids – and technology makes it just a little easier every time.

Unfortunately, the Mars Polar Lander was not one of those times. It thought it had already touched down with its landing equipment and shut off its engines. It was actually so far away that engine-less impact with the ground destroyed it, about four months after the loss of the Mars orbiter. Talk about an unfortunate year for Mars projects!

Mars Orbiter

This lander was supposed to work in tandem with the Mars orbiter. Unfortunately, that had burnt up in the atmosphere earlier that year. Luckily, the teams in charge of the Mars Polar Lander had already designed multiple redundancies into the system, and it could still send messages back to Earth without the orbiter. It was disappointing to have lost their communications relay, but not world-ending. The sheer number of telecommunication devices built into this robot make its totally preventable death sting worse, but NASA (at the time) was struggling with its chain of command and funding – the lander had to work, or else…

Or else leadership was wrong, and the issues were upper management, not lower.

New initiatives meant that some of the work was being outsourced to the lowest bidder, which – once again – was Lockheed Martin. It’s important to note that in the orbiter’s case, the only thing that killed it was that metric-to-imperial conversion. If it had handled that, there’s no reason to believe the construction would have failed. The same goes for this lander: as far as we know, the issue was a software issue, not a construction one.

Falling Out of the Sky

Once again, the lander was supposed to lose communications as it got into position to land and reconnect afterwards, just like the orbiter. Entry went fine, as far as anyone could tell! The lander didn’t burn up in the atmosphere, or get shaken to pieces, all the equipment was intact – everything was going perfectly fine. Right up until a software bug stopped its thrusters too early, and it hit the surface of the planet at extreme speed. It was carrying two probes with it that were designed to hit the planet at about 200 meters/second, and those didn’t survive the crash either. That’s how fast the thing was going when it hit the ground!

At least, that’s what they think happened. The lander’s communications were interrupted by landing, as planned. All NASA had was their previous tests, which did their best to simulate landing on the surface of the planet.

What (Theoretically) Happened?

The propulsion system was only supposed to stop once it detected a signal from the legs, which would only send that signal once it had touched down. However, one of the legs was sending out bad signals, and the lander shut off its main engines prematurely. It misinterpreted the drag and jostle of moving the legs into position as coming into contact with the surface of the planet.

Once it was sending the signal, it would continue to send it. The machine couldn’t correct itself, and other sensors for location (like the Sun and star trackers) weren’t of much use so close to the planet’s surface. Testing had shown this could happen, but they hadn’t expected it to – it was a bug in the code itself, and it didn’t happen every time. Some say the test sensors were wired incorrectly, but this is a theory and not something NASA verified.

The bug was noticed, and obviously they should have fixed it when they could. However, fixing it would have been a huge undertaking, and NASA was in a hurry to get this thing out the door. They hedged their bets against it happening. Little bug + lot of work to fix it = ignoring the issue.

Much like the Challenger’s O-ring failure, if they’d had the complete picture, they might have held off on launch. Nobody with the authority to delay wanted to, and nobody who wanted to delay had the authority.


NASA spent some time looking for the polar lander with the next Mars probe they sent out… but it was never officially found. Without communications, finding the pieces was like trying to find a needle in a haystack! The microprobes for surface-testing were supposed to land about 50 seconds before the actual lander did, but they’d be an estimated 62 miles away when they hit. That’s the kind of distances they were dealing with. The lander itself definitely flew off-course as soon as the engines cut.

Remember that Malaysia Airlines flight that was never found? They still don’t know for sure where it actually hit the water, even though it was going much slower. The range the plane could have flown in the hours it was missing is impossibly huge – basically the entire Indian Ocean would need to be searched. Granted, people searching for parts to recover also had to deal with the ocean moving and sinking them, but the Mars Polar Lander is in a very similar situation. The lander could be basically anywhere in the top latitudes of the planet. It would take years to find it without the precise data lost in the crash.


The lander failure points towards more rigorous testing procedures. There’s a saying in tech: ‘if you haven’t tested the backup, then you don’t have a backup’. All the redundancies in the world can’t save you if an essential piece of the operation is missing. All those antenna, all that work, lost because there was nothing else to detect distance from the planet itself. No backup. Only legs.

NASA’s Faster-Better-Cheaper initiative, meant to cut down on administrative and material waste, was brought up to the folks in the post-crash conference several times. Two projects were sent out cheaper than previous projects, but not cheaply, and both had failed for seemingly stupid reasons. Why?


Contractors. And a failure to test.


A big part of the issue was miscommunication. Information was passed up the chain of command, but the message would be different by the time it finally made it all the way up – huge issues were reduced to minor problems, and therefore nothing would happen. ‘Little’ issues, like that orbiter’s misreading of its data, never made it upwards: the orbiter that went with this project died because the manufacturer didn’t translate the force measurement to metric, and it was noticed, but the engineers were so low down on the chain that it never got passed upwards.

The contractors involved in production of probes and the like were also part of the problem. The two microprobes allegedly weren’t tested too thoroughly for launch, but if they had been, NASA might have gotten something out of the lander’s failure. The lander itself was tested, but not thoroughly enough. Surely, the other hundred engineers on the project would have noticed something was wrong. But they didn’t, not officially. Concerns landed on deaf ears. Forms were dismissed for being incorrectly filled out. The issue was in the administration of the project, not the physical lander.

Some say that Lockheed had bid too low for what it would actually cost to make the lander, and that they’d cut corners as a result. NASA denies this, and based off of everything I’ve found, they’re right: there’s no reason to doubt it was anything but the sensors and a software bug. The legs themselves were fine, but the programming surrounding the landing equipment wasn’t tested as rigorously as it should have been. If it didn’t continuously send the signal when it was deployed, the lander might have made it; if NASA had delayed launch to fix the issue, the lander might have made it; or if NASA had spent more time on the lander’s ground-response, it might have made it.

There’s a lot of ‘ifs’, and that’s an issue when projects cost millions to make and launch.


Mars Orbiter: Measuring Matters

Elizabeth Technology April 5, 2021


Picture the time: it’s 1999, and we’re in a new era of peace. We’re between major wars, communism has been cornered in small pockets, the economy’s doing fine, the Berlin Wall fell some time ago, and the hole in the Ozone Layer at least isn’t getting any worse. And 9/11 hasn’t happened yet.

It really felt like this was the era we’d get to Star Trek levels of technological advancement! The future was bright and ready for humanity to take the next big leap forward.

Enter: NASA. We’d landed on the moon, we’d launched satellites, and we’d sent landers to Mars. Now, we were going to send more, with better tech and better tools. It was still kind of a flex of US propulsion systems, but it was more about the learning we’d get from it.

Mars is a popular target for many reasons. The primary one is the lack of acid clouds and hellfire found on Venus – landers sent there don’t last very long at all. Besides that, Mars is more likely to support life. It has some amount of frozen water on the surface, a thin atmosphere with weather, and some movement in the mantle. If there’s anywhere people could live besides Earth in the solar system, it’s probably going to be Mars. Moon bases are a cool idea, but the moon doesn’t have nearly as much water as Mars does, and no atmosphere is a big downgrade from thin atmosphere. Not to mention things like gravity – astronauts who spend a lot of time in space come back with weaker bones and elongated spines. The moon will make you taller than Mars will, but you’ll also get weaker, faster.

 The Mission

The orbiter being sent this time was meant to analyze the atmosphere of Mars, safely, from a high orbit. The orbiter was assembled by both NASA and outside contractors who could make the specialized equipment, and nothing had gone wrong during the building process – yet. What had actually happened was a piece of software made by Lockheed Martin was delivering results in the wrong units. It was using the American pound-seconds, instead of Newton-seconds like the rest of the craft.

Lockheed had been sending their data like this the entire time. The NASA team assembling the orbiter was aware of this and was translating their data into the correct format, but they hadn’t yet programmed this into the craft. Final assembly came and went, and officially, no issues were reported. Unofficially, a couple of software engineers on the project had discovered an issue with data sent to the orbiter during tests, but – allegedly – didn’t fill the forms out correctly, so the problem was dismissed.

The orbiter is launched into space. Several months pass, and all is going according to plan. Everything at this moment is relying on other, in-house software without the conversion problem, so the launch goes fine, the thing’s on the right trajectory, etc. etc. When the orbiter is sent out on its own, however, engineers discover a problem they can’t fix: the thing’s not where they thought it would be. It’s very close to Mars. It’s at the lowest it can be and still survive, so engineers didn’t panic right then and there. They don’t know about the software bug. They began corrective maneuvers, but lost contact with the orbiter as it disappeared behind Mars. Anxiously, NASA waited for it to reappear, but it never did. They’d lost a project.


The silence in that room must have been deafening.

The AfterMath

Wikipedia calls its death “unintentionally deorbited” which is kind of funny. NASA accepted full blame for the project’s failure, and essentially said their quality control hadn’t been up to snuff. Errors detected when the orbiter was still on the ground went ignored, and Lockheed was supposed to be converting its software to metric before it sent it to NASA (allegedly – Simspace mentions this, but NASA’s release does not).

Ultimately, the unit the contractor’s software was using was 4.45 times greater than anything the rest of the craft was using. The software reads some measurement it’s getting from it’s thrusters vs. its real position, and panics, because it can tell it’s not where it’s supposed to be. It gets closer and closer to the planet, and all the while it’s still panicking because now it’s not the right speed or at the right place. Eventually, Mars consumes it, like it consumes everything we send it.

All because the unit measure was supposed to be tested at the contractor facility.


To quote Josh Bazell, “In metric, one milliliter of water occupies one cubic centimeter, weighs one gram, and requires one calorie of energy to heat up by one degree centigrade—which is 1 percent of the difference between its freezing point and its boiling point. An amount of hydrogen weighing the same amount has exactly one mole of atoms in it. Whereas in the American system, the answer to ‘How much energy does it take to boil a room-temperature gallon of water?’ is ‘go [CENSORED] yourself’ because you can’t directly relate any of those quantities.”

NASA had been working in a strange hybrid system that made data translation ugly and added steps to sharing valuable data with other nations. As seen in this Mars mission, anything that adds steps also adds room for error. Spaceflight calculations don’t need weird, arbitrary numbers to clutter up the already-complicated systems in place.

Between this mission’s failure and other nations’ requests, NASA announced a plan to switch entirely to metric in 2007. As they say on their website, standardizing units also means that we’ll be easier to cooperate with in the future! And there’s less chance of misunderstandings between co-op projects on other planets. Imagine getting to Mars and realizing the US rover’s 3 miles away instead of 3 kilometers! Or wasting valuable weight on fuel because US gallons are larger than metric liters.

It feels like a miracle that this was the first time an issue like this had ruined a mission!