Hardware Headaches

Beginning this year, I began to notice my muskie reel running very rough. Instead of a smooth, quiet spin, you could hear the gears pushing each other as I cranked on the handle. Even worse, the reel would often skip, or reverse, when I would be yanking with the rod at the same time as I was pulling in line by turning the handle. I realized this was not going to go away or fix itself, so I sent the reel in to Badger Reel Repair. They were able to fix the reel, but I they had to replace many of the interior parts. I was informed rather directly that my current reel, an Abu Garcia Ambassadeur 6600 C4 was not built to handle the type of fishing that I do. Ripping big rubber baits such as bulldawgs as well as large bucktails like twin 10s will wreck havoc on my reel. As a result, I’m in the market for a new reel. It’s kind of disappointing because my C4 is a $140 reel, and I’d expect that to be able to stand up to anything, but I guess that is not the case. My prime choices for a new reel so far are the Shimano Calcutta 400B, the Abu Garcia Revo Toro NaCl, and the new Abu Garcia Revo Toro S, which doesn’t come out until September. Although it will stink to shell out $200 – $300 on a new reel, it will be fun to use some new equipment, and to compare its performance to that of my current reel.

Calcutta 400B
It can be pretty frustrating when equipment (or hardware) doesn’t work the way it should. During our recent server upgrade, I put forth great effort to ensure everything would go smoothly. I created a set of procedures and practiced those procedures multiple times. I closely inspected the current server to find out what needed to be migrated and what could be left and forgotten. I did everything I could think of to make this a success and a positive experience for the business.
The day of the migration went smoothly. There were a few minor hiccups, the worst of them involving database mail, but overall everything went fine. I did the upgrade on a Saturday, and everything seemed gravy into the next week. Then on Wednesday night, the new SQL Server errored out. I checked the logs and they said the data and log files were inaccessible. I went to check for the files and found the drives were gone! A reboot of the server brought them back and after running DBCC CHECKDB on all databases I was able to tell we were back online. I check of the Windows Event Logs showed one error at that time: “The driver detected a controller error on \Device\RaidPort5”. This isn’t the type of error that most DBAs can troubleshoot. In fact, it eventually took several IBM techs to get to the bottom of the issue.
We tried updating some firmware, but the issue occurred again the following night. This time our system adminstrator vendor brought in an IBM tech to troubleshoot. After multiple hours on hold with IBM support, the IBM tech found the raid card to be loose in the server node. He clipped in back in tightly and we gave it another go. That night, the issue occurred yet again. At this point, the disruptions to our production environment were just too great. Later that evening, the Friday after the initial upgrade, we failed back over to our old database server. The issue is still ongoing, but we are hopeful that a replacement RAID controller will fix them.
One frustrating thing is that the issue didn’t really happen for the month and a half we had the new server in standby, and it hasn’t happened since we took it out of production. We don’t really know how to tell if the issue is fixed without putting it back into production, where it causes severe problems when it bombs. The other frustration is how I perceive this incident affects other people’s impression of my work. I tried very hard to make the process go smoothly, but in the end I was derailed by something that was never in my control. All the users know is that I upgraded the server, and suddenly we are having system outages. I wish there were a lesson here, but in retrospect I can’t think of anything I could have done differently. The only way we may have been able to catch this is if we were able to simulate a production type load on the new server before we put it into production (which we don’t have the capability to do at this small employer), and even with that, the failures occurred off hours when the load was very minimal. We may not have found it regardless.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s