Monday, December 15, 2008

Performance Tuning JCAPS - Part 2

It's been close to 2 months since my first Performance Tuning JCAPS post. Since then, we've noticed our servers running out of TCP connections under heavy load. Researching this problem, we learned that the JCAPS' JMS server makes heavy use of TCP in implementing Request/Reply queues. We approached Sun for guidance and they assured us that they've had success implementing high throughput applications using the JMS request reply solution... but how?

It appears a multi-step solution is needed.
  • First, running out of TCP connections in a scenario like ours is a JCAPS bug addressed in an ESR (110348) and rolled into JCAPS 5.1.3 Update Release 3. We installed this update and noticed a marked improvement. Running my simple 75 user simulated test, transactions took 2521 ms to round trip (compared to the previous time of 20128ms). While this is great improvement, an average of 2.5 seconds is still really slow for this simple transaction.
  • We reported our findings to Sun and they responded with 3 additional suggestions to try.
    • The first suggestion was to turn on the socket pooling in the JMS server. This is accomplished by setting the JMS URL to
      stcms://hostname:18007?com.stc.jms.socketpooling=true
      in the eDesigner's Environment Explorer setting for this logical host.
    • The second suggestion was to change the registry setting for TcpTimedWaitDelay to 30. This setting controls time a TCP resource is made available to the system again after the connection is closed.
    • the final suggestion was to change the registry setting for MaxUserPort to 65534, making more TCP port numbers available.
    The combination of these changes led to an amazing upgrade in performance to 110 ms for the previously run 75 user test - or as I like to say "pretty darn quick".

So, what's my opinion on all this? Obviously I'm really happy with the results, but the steps leading to the solution is what has me scratching my head a little.
  • ESRs are unnecessarily hard to implement. First you need to determine the prerequisites and corequisites to install with these patches. Sometimes these are listed in the ESR documentation, but often times they aren't and a call to Sun is needed to sort it all out.
    Then once you have all the ESR you need, the repository, integration server, and eDesinger all need the patches before you recompile and redeploy the code. Needless to say, this is pretty time consuming.
  • I did not see any reference to this JMS server URL change in any documentation. It seems that the URL is the only place you can change this setting. There is not administrative console to make this change. Bummer, since it seems pretty important.
  • As for the registry settings, environment changes are often the first thing forgotten and the last remembered when upgrading to a new server. I don't like making changes that affect my applications which aren't checked into some sort of source control.

Saturday, December 6, 2008

Side Effects May Include...

Software development is complex. A lot of communication is needed to coordinate with users and between teams to synchronize efforts. On large teams developing distributed systems, effective communication is exponentially harder. In addition to the users and development teams, you also need to deal with system admins, DBAs, network, and change control specialists.

In many environments, this later group of people (the system admins, DBAs, etc.) do not get involved with the project until late stages. The development team might have free reign over development databases for instance, but when the code migrates to a certification environment things are suddenly very different. Teams move from surroundings where they have total control of machine settings, databases, and resources to an environment where they have none.

Often, during the migration to a test or production environment things are often "forgotten". Things that often don't make it into an application's change control system.... database schema updates, new environment settings, property file changes. If these changes are not documented for migration to production, it's a disaster resulting in an embarrassing mad scramble to remember and apply these changes on deployment day. No one feels good after backing out a bad production roll out of a system.

The knee jerk reaction when something like this goes wrong is to add more process - more forms, more user sign offs, an increase in role based access to systems. This seldom however fixes the problem - it just adds more overhead to getting the problem fixed once these problems rear their ugly heads again.

What's really needed is a better process through automation, not more process. Automate testing of the system. Create batch files to apply schema updates, change environment variables, deploy the application, and test in a single step. Use this script in your test environment and run the same script when you migrate to production.

All the time we try to sell the business on automating their processes through the programs we write. It always baffles me that, as software developers, we seldom take our own advice.

Wednesday, December 3, 2008

Secret Sauce

Good things always have that "secret sauce" - the element that sets it apart from the pack and makes it better than anything else. The "secret sauce" takes something ordinary and makes it extraordinary. It's a Mac vs. a laptop. Disney World vs. a theme park. A Big Mac vs. a hamburger.

When I hear company execs talk about what sets their companies apart, it seems there's no real "secret" to the sauce after all. It boils down to hard work and a commitment to your customers. A commitment to a better user experience. A commitment to treating every guest interaction "special". A commitment to a good hamburger every time. There's not a magic switch you can flip... it's a commitment to excellence through hard work. Looking for a shortcut to this kind of success is a waste of time.

All too often I've seen software development teams look for magic switches rather than commit to the hard work of fixing the underlying problem. For instance, attempting to fix database problems through caching when the real problem is in the underlying schema. Changing the graphics on a website, while ignoring usability problems. Refusing to correct a broken architecture by continuing to throw more server resources to the application.

I've never seen one of these approaches work long term, and the time spent exploring these workarounds took time and resources away from fixing the actual underlying problem and long term success.

In each instance, the commitment to fixing the problem was abandoned in favor of the quick and inexpensive band aid to hide the flaws.

It's why these products struggle to achieve success while the Mac, Disney World, and the Big Mac continue shine as examples of truly great products.