WQTS 06

This post is a little bit later than usual, thanks to Mother Nature.

It’s raining here in the Bay Area. Despite the way our three-year drought has been monopolizing the media’s attention, the mere fact of rain isn’t really newsworthy. This is, however, a major storm*. We’ve had a couple of days of warnings and–all joking aside, perfectly justified–sandbag distributions.

* By local standards, naturally. Those of you in hurricane and monsoon zones may snicker derisively. I also grant permission for those of you in snow zones to laugh hysterically.

And, Pacific Gas & Electric workers have been making the rounds, trimming branches that might bring down power lines, and preparing as best they can to handle the inevitable outages.

Before I start discussing the failures here–and they go beyond QA–I want to be totally clear that I’m not dissing PG&E’s field employees. They do a vitally-necessary job that carries a high level of risk even in the best circumstances. Kudos to them.

But.

At 7:59, I got home from driving Maggie to BART. This had nothing to do with the weather; I drive her most Thursdays; that it means I’m not trapped at home if the storm knocks out the power is purely a bonus feature. I pulled out my phone and started to send her an e-mail assuring her that I had made it home in one piece. At 8:02, while I was still writing, the power went out.

I was a little surprised it had stayed on as long as it had. I finished the e-mail, sent it off, and made the rounds of the house, shutting off computers (yes, we do have multiple UPSes; doesn’t everyone?) At 8:10, I called PG&E’s automated outage line. This is a voice-recognition system. None of that old-fashioned business of punching numbers on the phone.

The first thing the system does is ask if you’re calling to report a dangerous situation, such as a downed line. I said “no,” and the computer played a pre-recorded message extolling the virtues of using the Web to report outages. Finally it asked “Are you reporting an outage?” I assured the friendly silicon that was exactly what I wished to do. It matched my phone number to billing records, asked me to confirm my address–it was correct–and then informed me that there was not a known outage in my area.

That was the first sign of trouble. I’ve never been the first person to report an outage, even when I’ve called immediately after the lights went out. By the ten minute mark, there’s no way I’m the first. So, Failure Number One: either the design of the system is faulty, in that it does not inform users when there’s a problem retrieving outage data, or there was a QA failure, and the error detection routines were inadequately tested.

But, OK, fine. I assured the computer that my power was out and I wanted to file an outage report. “OK,” said my electronic buddy, “is this a complete outage or a partial outage?”

“Complete,” I replied.

“I’m sorry, I didn’t understand that. Please say either ‘complete outage’ or ‘partial outage’.” Failure Number Two, and I put this one squarely on the design team. Why should I have to say “outage”? The important–and distinctive–information is the first word.

“Complete outage,” I said, willing to go along with the joke.

“Please hold,” PG&E’s electronic idiot said. A moment later, I heard a new voice.

“Hello, this is [name withheld to protect the innocent] in Sacramento. Do you want to get the status of an outage?” Failures Three and Four. Design flaw: I was not informed that I was being transferred to a human operator. Design or QA flaw: Said human operator was not alerted that the system had failed while taking an outage report.

NWTPtI was very polite and helpful. The first piece of information she asked for was my address. Failure Five–the automated system had correctly identified me; my account information should have been transferred along with my call. Again, this could be either a design or QA failure.

“There are three hundred forty six people affected by an outage in your area. There is no estimated time for the return of power yet, but a worker has been dispatched,” NWTPtI informed me, and then asked if I would like to be notified when an estimate was available and again when power was restored, and offered me a choice between text message or automated phone call.

I chose the latter, gave her a few more pieces of standard information*, and we concluded the call.

* Including the closest cross street. Shouldn’t that come from a geographical database as soon as she entered my address? I could call this another design failure, but why pile on? After all, it could have been some kind of perverse validation that I wasn’t pranking PG&E.

At 8:47, the power came back on. At 9:59–more than an hour later–I got an automated call from PG&E informing me that my power had been restored at 8:50. I’ll give ’em a pass on the time discrepancy; three minutes is within reasonable rounding error. Hell, I won’t even ding them for the delay in calling. It would be unreasonable to expect them to have enough lines to contact thousands of customers in real time.

But there’s still Failure Number Six: I’m still waiting for the call with the estimated time to make the repair. This one I’m throwing at QA. Either a policy change was made and nobody caught the resulting error in NWTPtI’s script, or software QA missed at least one condition under which a call wouldn’t be made.

The bottom line is that the power is on. That’s far more important than letting me know how long it’s going to take–I’d rather sit in illuminated ignorance than enlightened darkness–but really, PG&E, much as I respect your field workers, I’ve lost quite a bit of respect for your back office personnel.