PUCK Outage Information

February 14th, 2013

So, we often reboot machines with little to no consequences. We reboot our phones, cars, laptops, desktops and even servers. This uneventful thing isn’t what happened to me on Monday.

So, many years ago I moved my machine out of my home and decided it would be a good idea to pool resources with several other people for whom I was either hosting or sharing space with. Being a technology person, I had a T1 at my home from 1997-2010. Friends, and other would share resources with me and I returned the favor in-kind.

I have used a variety of technology over the years from the FreeBSD jail support in 4.8 (with a patch) up to the FreeBSD 7-8 series. Due to personal preference and my desire to spend less time compiling things (Plus the fact that I disagree with FreeBSD packaging, development and have had problems with modern hardware support…) I undertook building a replacement host in 2011.

FreeBSD jail can be quite elegant. You could run multiple servers on one physical hardware, share the pool of disk space, cpu and memory all without being limited to #cpu or memory footprint within a virtual machine as you are with vmware and other systems. Having used vmware in some form since my original 1.0.x license that expired in 1999, I wanted to provide a reasonable service to those I shared with.

I went and moved the system to Linux and the closest thing I could find at the time that wasn’t going to limit the CPU/memory/disk usage was Linux-Vserver.org. This required a small kernel package and was distributed as part of Fedora in the base OS without trouble. There were a few limitations to management, but I was willing to live with them at the time and proceeded to move over ~7 machines to the new hardware. Sometimes I would stand up something for a friend then tear it down, but on Monday there were a total of 8. (One I have left down until that the owner contacts me ..).

So during the Monday reboot, the goal was to upgrade the IPMI interface on the motherboard (SuperMicro X9SCA-F) as well as various firmware on the SAS controller.

What happened next was something that would consume me for the next 48 hours.

Upon rebooting the system, the virtual machines would not start properly. I went and tried to upgrade/downgrade the related packages. Rebuild with the latest kernels and modules… I waited through a very long BIOS and SAS boot up and initalization process (it takes ~45 seconds for the mpt2sas driver to probe my 4 disks) each time I rebooted the machine. When I typed “shutdown -r now” the IPMI interface would show the system actually powered off instead of rebooting. When you are sleep deprived and feeling a small bit of pressure, these small things worse.

At some point approaching 24 hours into the process the decision was made to just move all the systems into VirtualBox. You can judge and whatnot, but it was easy. It was free, and I found documentation online about using qemu-nbd to be able to mount and rsync/move the files from the ~1.8TB /home partition that had puck.nether.net and the other hosts over.

Well, in theory. When I built the system, it was the height of the hard drive shortage. I was also “cheap” and just got 4x1T 7200RPM SATA disks. The case for the chassis is 2U and only has 8 bays. Turns out interesting things happen that slow you down, such as the I/O performance of the RAID 1+0 setup isn’t what you would like. As usual, linear reads can run fast, but the lots of random files that people collect on their systems take a long time to stat() as part of that rsync process. The disk cache never seems like enough, and most filesystems don’t perform well under this load.

After trying to rsync the data over with qemu-nbd, it turned out this was corrupting the new VM vdi file filesystem. One system took 3-4 tries to get it recovered right and I finally had to destroy the file and redo everything. Trying to run 7 parallel rsyncs as well? Will cause some really high numbers with iostat -x … you will see read/write wait times approaching 10+ seconds. I’ve seen some mean numbers this week, and those felt like they were slowing me down. Turns out doing them one-at-a-time may have worked out better, but I was hoping the OS disk cache would work better than it did… Also, when you see these long iowait times, it’s enough to cause an OS in VirtualBox (at least) to time out the emulated disk and reset the internal disk controller(!). This was not expected.

After many hours in the process I decided to take a nap Tuesday morning and got in about 3 hours of sleep. Tuesday night, I got more as I waited for the syncs to happen. Sometimes it’s just OK to leave something down and broken for a bit longer. Nobody was “really” screaming about things, but I felt obligated to fix it ASAP.

Of course, once I started to get the machines turned up there were the inevitable problems. Mailman bounced a lot of mail as it wasn’t permitted by smrsh, but the user email worked ok. The load average on the new VM went very high during the mail processing and would periodically reject the messages.

There’s a lot more that could be included but I wanted to highlight a few last things.. having more spindles good. Having friends that will look at something when you are sleep deprived is good. Perhaps using a VM isn’t as evil as I had originally thought, but still isn’t my first choice. Taking a nap and leaving things broken? Good.

Having a wife that is understanding and didn’t shoot me? Very good. I don’t think she often realizes how much she is appreciated, but she is more than I will share in public here.

Hope everyone is having a better week.. I promise to not upgrade anything else for the next 15 minutes.

Grainger for cool supplies

January 18th, 2013

I’ve been doing a lot of weird projects around the house, either my indoor rock wall, snow making or other things. As a result, the usual locations of Lowes and Home Depot can be very-expensive or don’t even stock the parts you need. I wanted to make a short list of the things I’ve purchased in recent years, including how and where you can save some major money as a result.

You may think you can’t buy from Grainger because they only sell to a business, but there are a few exceptions to that.

1) If your employer has an account there, you can buy things as an “Employee Purchase”. Typically all you need is the main phone number, but also having the account number can be helpful.
2) You can sign-up for a business account. This may require doing something like setting up a LLC. In Michigan where I live, this is around $50.
3) Find someone else that has an account and use theirs but do an employee purchase as well.

It shouldn’t be hard to do one of these. Here’s the reasons why you might want to.

For my rock-wall, I needed about 200 3/8-16 T-nuts. These are typically very expensive at a place like Lowes. At Grainger (Part# 1XGJ1) they are $16 for a box of 100 instead of almost 50c-$1 each. The same is true for the Socket-Cap screws (bolts) which are about $1-1.50 each at Lowes, you can get a box of 50 (4XE65) for $18. Keep in mind this size works for *most* holds, but some require longer bolts of 2 1/2 inches or more.

For my snow maker, I needed a liquid filled pressure gauge to measure the water PSI. This was not something that was in-stock at Lowes/Home Depot and only cost $20 to arrive next-day at Will-Call.

I’ve also used them to replace the blower motor in my furnace when the local furnace repair shop did not call me back. (Nothing like waking up to the smell of a burning electric motor)!

Culture of Crisis

April 19th, 2012

I wanted to take a moment and explain my claim on the term “Culture of Crisis”. I had a chance several years ago to do some consulting work. Through this I came to have a new found understanding of what can easily happen at any organization if they don’t keep their operational culture in check.

Every company and group faces a challenge when something goes wrong. Some have a detailed process, including the need to write postmortem reports, assign blame elsewhere or to punish those that were responsible.

When one is hit with a crisis, this distracts from whatever else was being done. While some employees may sit idle waiting for the panic button to be pressed, usually the most valuable employees get involved quickly in a crisis as they have experience fixing things.

One has to be careful to avoid an all-hands-on-deck strategy to responding to problems. This can be useful if your team or company is just 5 people. The problem may actually require everyone to solve it. Generally larger organizations do not require this.

Be mindful of how many people you involve in solving your problem. Don’t have a crisis conference call where the business and technical people are together discussing the impact. Split these, but maintain communication. Have a leader willing to ask questions and direct the response. Ask questions. Communicate with the experts. Engage only those necessary. Having everyone join the 911 call can lead to a situation where everyone is there, but nobody is willing to speak up.

The shotgun approach to problem solving is good if you need a large team to solve the problem, such as cleaning up a disaster site. Responding to a technical issue works better having the right people engaged. Too many people and it becomes any other large meeting with people worried about the “internal or external optics” of the event.

SOPA and Protect IP

January 18th, 2012

Please take 5 minutes today to call both your senators and your representative. If you don’t know who they are, look that up here: https://writerep.house.gov/writerep/welcome.shtml and http://www.senate.gov/general/contact_information/senators_cfm.cfm

Protect IP and SOPA are important for the technical underpinnings of the internet that you depend upon daily. They are easily bypassed by using an IP address to reach sites such as using typing 74.125.225.81 to reach google instead of that name.

A simple script for you to follow if you are not sure what to say:

Hello, I live in (Location, eg: Ann Arbor, Michigan/City/Township) and was wondering what the Senator/Representatives position is on the proposed legislation of SOPA (House – HR 3261) and/or Protect IP (Senate – SB 968).

Be respectful when talking on the phone, and convey any feelings you have on the topic.

Shipping Industry and Travelers

October 31st, 2011

I sometimes travel for work and pleasure, but also have packages shipped to my home without some sort of pre-notice of what they are or when they will arrive. For several years I have been subscribed to FedEx Insight to get notices of inbound packages. This is very nice as we could enter in many variations of our home address and people who would receive and get notices of this. It is often incomplete as shippers can block this visibility or it may come to yet another variation that we are unaware of.

UPS has finally launched a consumer-oriented version of this service they dub “My Choice”. One can sign-up online here and get notices when packages are inbound, usually the night before. What’s even better here is you can ask them to hold/delay delivery, and see what items may be signature required. They also give an estimated time of delivery based on the schedule and expected load of the drivers.

I’ve already come to love this capability as it can easily identify when a package is coming to our home. If you miss your driver, or have people ship items to you unexpectedly I strongly suggest you sign up for this service, as well as the FexEx variant. FedEx will call you to verify each addition, even if it’s just an address variant (eg: LN vs LANE).

Thoughts on T-Mobile USA, Sprint and AT&T

October 11th, 2011

Sprint (S) has seen its stock price slide 34% in the past week and is currently down 65% based on its’ 52-week high. This slide has been continued as a result of the recent announcement that Sprint will convert to a LTE cellular network from the WiMAX network deployed by their partner Clearwire (CLWR).

Some are now predicting that Sprint will be forced to file for bankruptcy due to the increased debt load they will be required to carry.

This has consequences in other market segments, including on the AT&T (T) and T-Mobile USA (subsidary of DTE.DE) acquisition.  I have privately predicted that I believe AT&T will be on-the-hook for the $3 bln breakup fee for many months.  There are only a few global players that have the capital available to acquire Sprint. Sprint is a major player in the federal telecom space, consequently any of these would face blockage by the US Government as part of the Treasury Department CFIUS process.

With the current Verizon (VZ), Sprint, AT&T and T-Mobile players, the elimination of Sprint and T-Mobile as players is not something the Department of Justice will stand for in their antitrust case against AT&T.

Prediction:

With the pending credit downgrades of Sprint, I do not expect the pressure to decrease on them and the potential for an acquisition to be unlikely. The most likely situation I see is the T-Mobile acquisition to not be completed, Sprint to file for bankruptcy in the second half of 2012. The fate of T-Mobile USA is uncertain, perhaps Sprint can shift their commitments from Clearwire to T-Mobile, but adding a 3rd incompatible network (CDMA vs iDEN vs GSM/LTE) is going to be a major capital process regardless. Short of splitting the wireless and fixed network assets to promote value, I’m not sure anyone wants to pick up the disparate technologies and assets to continue an integration plan.

DISCLOSURE: Author owns none of the stocks mentioned in this article.

Updates on Fiber Projects, Regulation etc

December 16th, 2010

It’s been a year since I started my research into building fiber to my home to solve the problem of no willing providers (eg: Comcast, ATT) in my area.

I thought I’d provide some updates.

I no longer have a T1 at home, it has been replaced with a wireless link using the Ubiquiti Nanobridge M5. This allows me to bridge the 3 mile gap (shorter in another direction, but no line-of-sight that way) to pick up another connection. Cost savings: about $200/mo. Speed increase, roughly 10x. The link actually gets ~60Mb/s but the wired part of the network isn’t fast enough to allow those data rates.

I submitted data to the Google RFI, but don’t believe any project in Michigan will be successful. Despite high-ranking U of Michigan alums at Google, I suspect the state laws and climate will not be favorable to such a project.

A lot is being made these days about Comcast and their network management practices.  Constraining how much internet access they buy is a legitimate business action they can take.  I can defend their choice to make that decision.  The problem I have is the monopoly (or duo in some areas) for local internet access.  Some people may have little or no choice what access they receive.  Alternatives such as Cellular aren’t real alternatives with individual software updates around 1GB in size.

If you have a spare $1 to $50 million to spare, I would be interested in pitching the idea of a large fiber project to you. I think there’s benefit as well to provide infrastructure for smartgrid and other communications in parallel with a countywide wired network. If you tie in meter reading (electric, water, gas) there could be significant savings and value.  I do wish that the utility companies (eg: DTE, Consumers, etc) would leverage their existing RoW for placing fiber to deliver further competition to supplement their electric and natural gas business.  I think this combined with changing state laws is the only method that will be viable in the future.

Local schools and counties have unused fiber assets, but are unwilling to make them available as they were paid for out of school budgets.  I do wish they would reconsider to provide additional school funding, but also to make those routes available for less.

I doubt logic will prevail here, but there is hope.  If the current incumbents stumble in a major way, change will become necessary and possible.

Ongoing thoughts about internet access

October 6th, 2010

I’ve spent a lot of time thinking over the years about solving the rural internet gap. The ability to make it possible is mostly limited by the “bootstrap” costs necessary.

Would you be willing to pay a $1000 install fee, and $50/month for internet access?

Would you support a ‘neutral’ last-mile provider of an “internet” pipe to your home, allowing you to select from a set of ISPs that can utilize that link for various services (eg: TV, VOIP)?

Gizmodo Banned from WWDC

June 6th, 2010

Gizmodo has been banned from WWDC 2010, and it sure seems like they’re going down the “Please-Hurt-Me” path. They are looking to violate the “hot news doctrine” which has been decided case law since 1918 (248 U.S. 215).

They are going to aggregate and re-publish other “hot news” from their competitor blogs. This was ruled illegal then, and surely applies in this case where they were BANNED and can not blog live legitimately.

Should be interesting to see what happens tomorrow.

TV stations being asked to move for mobile (Cellular) internet

March 3rd, 2010

It’s been reported recently that part of the FCC broadband plan is the desire to move some broadcast TV stations out of the 500Mhz frequency band so this can be used for cellular companies to have more bandwidth for their userbases.

This is entirely the WRONG move. Much more can be attained by using smaller cellular sites, and the deployment of picocell/microcell technology. If each home had even just 5Mb/s internet access, and a microcell device, the need for this would be minimized and cellular customers would be happier with increased coverage at their homes/offices.