Capacity Management the hard way

Saturday, 23 April 2011

Please Sir, may I have some Moore?

Back to the blog. Currently on holiday, currently working at an investment bank, designing new solutions for a new IT/Application approach to capacity management.

Capacity Management, when mismanaged, can be like a deponent verb: active in form, but passive in meaning.

Why?

The tail-off in CPU speed will have to change the way applications are written, implemented, and run. It is also revolutionising capacity management, though nobody has noticed it yet.

Why?

Simple: most applications are written with an assumption that CPU speed and compiler quality will fix up application speed. Memory is cheap, CPU's cheaper, cache for no cash. Furthermore, there has been a move from tightly-coupled processor/operating system computing to a 'one size fits all' approach.

For example, why was the combination of Tru64 UNIX and DEC Alpha processors so effective in the past, and why is the combination of AIX and Power7 so effective today? Answer: if one has a common organisation/engineers controlling the CPU and the OS, then the OS can be tuned to the CPU layer: system service calls can use architectural-specific instructions for speed.

The price to pay for this is cost. Cost of development and cost of implementation. Specialist OS/CPU combinations have to make life attractive for third-parties to develop software. That is why Oracle keeps on supporting IBM's AIX platform. If life becomes unattractive for the third-party, then support is pulled. Ask HP: to lose one vendor (Microsoft) for IA64 Itanium chipset is a happenstance. To lose Oracle as well is more than a coincidence.

So what about 'one size fits all'. Well, in beginning, there was NT (x86 and x64 (DEC Alpha), then we have Windows 2008 x86, x86_64 (AMD extensions) and x64 (IA64 for now, but not Windows 201x). For UNIX, we have Linux and Solaris x86_64 flying the flag for cheaper hardware.

BUT...

The price you pay for running on Industry Standard Architecture (ISA) is Industry Standard Performance for your non-industry standard applications.

Who has the IT lion's share of the budget outside public sector: answer - banking.
Who has the worst applications? Banks. What do I mean by 'worst' - written to tight dealines, fuctionilty over performance engineering. What happens when that application is chucked over the wall to production - onto the next version. Production has the issue of making sure the application doesn't fall over.

Well, Industry standard hardware + applications written to fill a gap in the market = trouble.

In future posts, I will go into some more detail on this...

Saturday, 24 January 2009

Hanging by a Thread - Capacity Planning in a Recession

Well in Europe, we know we are thoroughly in the mess. With household name banks announcing a 45Bn loss (GBP), it is either tally ho back to the Wiemar Republic, starring Gordon Brown as the Reichspräsident, or forwards to the euro. Either way, investment banking is a tarnished spirit.

For capacity planners, that is a disaster: most of our work came from that market. As for me, I am spreading my wings to non-investment banking customers.

So what happens in a recession to types like us? OK, here goes...

1. Jobs are eliminated as they are not seen as mission critical
2. The short term savings in staff costs are wiped out when there are no resources to do the work to correctly size systems.
3. Error rates rise: errors in sizing, costing, and performance.

Analysis

In this recession there is an extra wrinkle for us: the issue is that the applications which dragged the banks down into trouble: the Collateral Debt Obligations (CDO) and the even worse (CDO of CDO) were responsible for the explosive growth in calculation farms (and therefore blade servers) since 2002. The idea was that the huge calculations would work out the risk associated with a given basket of deals in the terms of market activity.
So, what happened? Was is that the applications failed to consider sitations where the market bombed beyond this generation's memory? Or was is that the CDOs were constructed on sand - mortgages which could not be repaid...
If the programs were so good, why did they forecsast this as a doomsday scenario?

The answer: read the story of the emporer's new clothes. As long as the money was coming in and the commissions and fees were being paid, no-one wanted to see otherwise. Consider this: the business were telling their developers to base their calculations on more and exotic products; the developers would order huge amounts of blades or other calculation servers to do these calculations faster than the other banks (don't think this is a new thing, the very first calculation servers I capacity-planned (is that a verb? yuk_ were DEC Alphastation 255's) from the hardware vendors (especially HP and IBM) ), and as long as profits were good, everyone was happy: banks and hardware vendors.

In fact things got even better for the hardware vendors when blades went multi-core. Since Intel has to shuffle off its plans for a 4Ghz single core chip (didn't stop IBM Pseries, but that is another article), the blades top speeds were quite slow on a single core basis, and very hot, on a data center basis. For the first time in living memory, application developers for calculation farm apps could not rely on the processor speed to bale them out. I'll dig out some graphs somewhere which prove this. In t'good old days, I could leap from a DL580 G1 PIII Xeon 700Mhz to a DL580 G2 P4 Xeon with speeds rising from 2.0 to 3.06 Ghz single core, and my jolly old application would leap ahead.

But because Intel put the brakes on their plans for a single core 4Ghz chip, applications had to use many more blades then they really had to. Worse, to get the best out of the multi-core CPUs one had to compile their code with chip-specific instructions via the Intel compiler. This recompilation simply could not be done well because of the cf the changes in floating point calcs which the new compiler would need, and the phenomenal amount of re-testing by the gods of the Analytics groups this would involve. In the past, developers would hard wire code to deal with known floating point issues with the compilers on certain chips. Not possible in this world unless you write a new app!

So what does all this mean for us:

Well, there will be lots of unused blade farms on unused data centers as banks pull out of CDO type activity (burnt child fears the fire syndrome). Lots means literally 100's of blades, very high spec, very high cost, very long depreciation - a lot can happen to a bank in three years...

With no customers for these blade servers, this gives data centres a golden chance to migrate the older non-blade x86 servers onto these blades, excising cabinets at a stroke. Whether this will be done, is another matter. One the one hand, the business owning and paying infrastructure costs for these servers will want to make sure any new customers get charged accordingly. On the other hand, any new owners of these servers will say to the previous owners 'you lot got the bank into all this trouble anyway, we're taking them, and you lot should all be sacked anyway for reducing my bonus to 0....'

So capacity planning will consist of modelling moving workloads around servers, from V to P, P to V, Non-blade to blade, and any which way. There are few people and even fewer software products which can help you do that in a client-server environment. In fact, I will put it bluntly, I only know of 5 people in Europe who have the **** to do this, and only one software that ever could do this: PAWZ Planner from PerfCap Corporation.

Think about: the world is evolving, and capacity planners must evolve with it. Otherwise one becimes as extinct as the dodo, or, in computer terms 'Tru64 UNIX'...

g'night,

pos

Thursday, 13 November 2008

HP World 2008 Germany

Just been to HP World Germany talking about Capacity Planning and Itania and other servers. Very small number of customers, very large numbers of HP people. Why? Because it was too expensive to go in this environment.

Shame really, if HP made it say 25 quid rather than 1200 quid to attend sessions (let alone the cost of transport) loads of people would come, HP would get loads of people.

Ah well...

Thursday, 3 July 2008

Grid and Application Designs

Traditionally, the way to improve an application's performance was to throw hardware, mainly CPU at it and hope it worked. In the Grid x86_64 space, this has been the rule for the past 10 years.

But what now: Blade CPU stop at around 3.0/3.2Ghz, and quad core clock speed is likely to be slower, 1.7/2.0Ghz? The traditional CPU route is suddenly running out of road.

Options?

1. Compiler optimization: Use smarter compiler options which take single-threaded applications and 'fake' multi-threading on certain CPUs.
2. Conserve space by replacing all single and dual codde blades with quad cores.
3. Look at application re-design. Find threads which consume a whole CPU when called and optimise them.

Consider this: the blades bought 3 or 4 years ago (single or early dual core) waste too much space in the data centre, and need refreshing with newer quad core blades.
The business will have to pay a heavy price for such early investment, and should ask the question: why do we always have to buy more hardware every year in such volume? Why not look at how the application is written, and tune the code to run on the right processor.

regards

pos

Tuesday, 18 March 2008

Updated website from PerfCap Corporation

After a long wait, it seems that PerfCap Corporation has finally got the web-masters in. New features includes fora, feedback, better product descriptions, and generally more information. At least it will be a resource for their many European customers.

Saturday, 8 March 2008

Storage Capacity Planning - SASAN new software

London: 07 March 2008.
Following decades of server capacity planning, the same tools have been modifed to provide storage capacity planning - multi-vendor multi-what-if: SASAN (Storage Analysis for Storage Area Networks.

Features:

EMC data analyzer
EVA data analyzer
UNIX and Windows data analyzer
OpenVMS and Tru64 specialist data analyzers
OVPA Support
SAN Mapper and configurator
SRDF and remote storage support
Intelligent IO mapping
Intelligent RAID scaling and host/controller based modelling
Automatic production of reports, what-if scenario.
Email for more information, examples, etc. pos@positechconsulting.com

Friday, 7 March 2008

Better Software Required: and supplied

From recent chats with customers on all sides of the IT divide (infrastructure, applicaton developers and business analysts) it seems to me that there is a huge gap in the Capacity Management cycle between business process definition and mapping of transactions to processes or fractions of processes.

So, in the best tradition of chess players who 'announce' checkmate in five moves, followed by a shuffle six moves later, I am going to make a software pre-announcement. For my sins. [software preannoucements are generally called 'vapourware' by the way: not in this case)

Release Date: June 2008.
Company: Positech Consulting
Product Name: not divulged
Internal name: snibbo.
Compatibility: anything
Function:
A dynamic on-line tool for mapping business processes to volumetric analysis software to produce a true end-to-end business function to infrastructure map - to the data centre fundamentals.
Features:
Everything.

Automatic business mapping from templates, or user-defined maps.
Matching of processes to functions
Full-scale performance analysis to a user-defined transaction level
Full-scale capacity planning process giving response time breakdown per workload, and what-if analysis to cover business volumetric and organic growth
Fully Automated
I will repeat that: Fully automated.
Platforms: UNIX, Windows, Linux, OpenVMS
Feeds: any

wish me luck. I had better start coding now.

pip-pip