PolyMon Blog is moving…

This blog is moving to polymon.blogspot.com

Please see that site for the current PolyMon blog.


PolyMon Version 1.1.0 Releases

A new version of PolyMon has been released in CodePlex. See www.codeplex.com/polymon for more details.

This version addresses some reported issues and also provides new functionality:

Post-Event Scripting
This feature allows custom scripts to be run after a Monitor test has run. Functionality allows either PowerShell or VBScripts to be run.

Heartbeat Notification
This feature allows operators to set a specific time of day when they want to receive a “heartbeat” notification from PolyMon. Operators can choose to receive a summary of current Monitor statuses for any combination of OK, Warn or Fail states.

Queued Email Notifications
When operators are offline, any notifications they would normally receive are queued. New functionality now offers operators to indicate how they want to receive these notifications: Each one separately, Not at All, or Recapped (which basically send a single email listing the current status of any monitor that raised a notification with a brief list of the notifications themselves – this allows operators to quickly determine whether a monitor is still in a Warn/Fail state when they receive the recap email).

Upcoming Release

Well, it’s been a while since the last stable release of PolyMon (June 19, 2007 to be precise).

During that time I have mainly been working on adding post monitoring actions. Essentially this will allow any monitor to execute custom scripts after a monitor has run. Since the status and counter objects are available to these scripts you would in practice be able to perform various actions based on the results of the monitor (for example, if a service monitor fails because the service its monitoring is not running you could start the service up, or if a log file has grown past a certain point you could write a script to purge all or part of the log file, etc).

The action scripting will, for now, support VBScript as well as PowerShell scripts. Technically PowerShell would have been sufficient for any scripting task, but because I’m not sure how many people have taken to PowerShell yet (which I highly recommend, btw), I opted to include VBScript.

 I did look into incorporating VB.NET and C#.NET “scripting”, but one of the obstacles I found with this approach was adding references to other .NET libraries at run-time. I could have worked around this problem by providing an Add Reference capability, but it would create other problems such as making sure the referenced libraries are available (and in which location?) on the server that runs the monitors, not just the machine running the management console. Looks like PowerShell is a better approach. Problem there is that Microsoft does not allow redistribution of the PowerShell binaries, so the PolyMon install now requires an additional pre-requisite on top of the usual .NET framework.

 The next release of PolyMon which will incorporate this new feature is due in the next couple of weeks. I have yet to fully test it in a working production environment and have to finalize the installer and help files. For the last stable release I took the plunge and switched to Wix for creating the insllation package. The learning curve was relatively steep and I ended up spending far more time that I would have liked – but in the end the effort was well worth it. I highly recommend using WiX if you are creating anything other than relatively basic installers in Visual Studio.

 In addition to this new feature several bug fixes will be rolled in (in particular one that addresses the internationalization problems when creating the PolyMon SQL database – thanks to Steinlaus for figuring this particularly frustrating problem out).

Some Comments on PolyMon Performance and Database Size

Several people have asked, and frankly felt concerned, about PolyMon’s SQL performance and database size requirements over time.

In terms of database growth, PolyMon includes a data retention policy mechanism that allows users to decide how much historical data they wish to retain for each monitor. This can be set for each individual monitor and is fairly granular allowing historical (aggregated) views of long historical time periods without requiring and extensive amount ot storage space.

Every time a Monitor runs in PolyMon, the event status result and any associated counters are stored in the database. For example, every time a Ping monitor runs, an OK/Warn/Fail status is stored in the database as well as counters for RTT and % loss are stored. If you run such a monitor every 5 minutes, you will generate and store 120 events per day, 43,800 events per year. Each record is very small in size (in the case of a Ping the total data size per event is about 150 bytes, including status and counter information). However, in addition to storing this event level data, PolyMon automatically creates roll-ups of this data into daily, weekly and monthly averages/totals. The retention policy can determine for each of these levels (raw, daily, weekly and monthly) how long data will be retained. Typically you might retain event level data for 1 month, daily data for 6 months, weekly data for 1 year and monthly data for 3 years. What this means is that storing historical data can become quite efficient depending on how long you want to retain event level data.

I have been using PolyMon in a production environment (with many monitors running every minute, some just 3 or 4 times a day, depending on the needs) for  18 months now with approximately 150 monitors. Although our retention policies have not discarded any event level data yet (we’ve got enough disk space!), our database is currently just over 4GB. By setting retention periods to 1 month at the event level, we could probably reduce this size to less than 300MB (since our aggregate data barely totals over 5MB, the majority of the space being used up by event level data).

In terms of reporting performance we have not seen any problems either. Our current Event table (which holds event level status data for our monitors) contains over 1.2 million rows and our Counters table (which holds event level counter data) is a little over 1.5 million rows. Reports, database updates, aggregations, etc have been performing fine. I constantly try and tweak indexes, stored procedures or pre-built statistical info to further enhance performance.

Where there is a potential performance bottleneck however, is in the monitoring itself. Currently the monitoring is agent-less and is performed by a single windows service (PolyMon Executive) that runs every monitor based on its frequency interval in sequence. Basically, the service has a primary timer that fires of every n minute (this is user configurable in minute increments). Each monitor is individually configured to repeat every nth timer cycle and the service evaluates whether it needs to run a specific monitor or not. If it does, it runs the monitor and moves on to the next one, otherwise it skips over it.

So far, at 150 monitors we have not experienced any problems with monitoring. However, this is definitely a bottleneck. The service is single threaded and therefore can only run one monitor at a time.

I recognized this was a scalability issue when I started coding PolyMon but decided to live with that drawback for a while. I intend to address this problem in two ways.

Firstly, I intend to make the windows service that runs the monitors multi-threaded (user configurable number of threads) to help alleviate “blocking” problems (where one monitor that takes a long time to run essentially blocks any other monitors from running). However this does not alleviate cases where the Windows server running the service is no longer able to keep up running all the monitors (even with a multi-threaded service). In other words, adding threading allows PolyMon to scale up, but not out.

Secondly, to address the scale out issue, I intend to allow multiple windows services (PolyMon Executives) to be running on multiple servers. Each monitor definition will then not only contain information on what resource to monitor, but will also allow the user to select which service instance the monitor should be run from. Actually the intent would be to allow users to either hardwire a monitor to a specific service or allow PolyMon itself to determine, dynamically, which service instance should be used to run the monitor.

Now this will take a little while to implement but I have already started researching this and have already started laying the groundwork to easily be able achieve this in a future release.

For now, the bottom line is that database size and performance has not been an issue, nor do I foresee it becoming an issue in the future. In a production environment monitoring over 150 resources (ping, wmi, perfmon, sql jobs, etc) we have not experienced any difficulties. However I am aware of the current bottlenecks and will address those, as outlined above, in the first quarter of 2009.

As always, I very much welcome any feedback, positive or negative, you may have regarding PolyMon. I hope you find it useful and find that it can perform, in certain circumstances, as well as some of the commercial systems out there that charge an arm and a leg.

PolyMon 1.0.0 Released

PolyMon 1.0.0 has been released.

In addition to various performance enhancements at the database level, product stabilization and minor enhancements have been added.

 In addition, the new installation procedure has been greatly simplified and now includes a single setup package with optional feature selection for creating and installing the PolyMon database, PolyMon Executive Windows Service and PolyMon Manager.

See www.codeplex.com/polymon for more details.

PolyMon RC3 Released

RC3 has been released. It can be found on the link indicated in the sidebar.

RC3 has made numerous enhancements to the system. Some of the hightlights:

  • PowerShell Integration: PolyMon now allows PowerShell scripts to be executed and uses some objects to pass State and Counter data between the PowerShell script and PolyMon. This effectively provides PolyMon a fully integrated scripting language and opens up monitoring to anything that can be done via PowerShell.
  • WMI: A new WMI monitor, with an integrated WMI browser and query builder is now available.
    Data Aggregates: Data aggregation is now built-in with daily, weekly and monthly affregations of both State and Counter data.
  • Data Retention Scheme: Each monitor can now specify how long data at various aggregate levels (from raw base data to monthly aggregates) is retained. This effectively allows a very granular control on database size.
  • Reporting: Reporting of historical State and Counter data has been completely rewritten and now supports the various aggregates implemented by PolyMon. In particular charting has been substantially enhanced.

Screenshots of most PolyMon features can be found here.

PolyMon RC2 Released

PolyMon RC2 has been released and is available at CodePlex (please see sidebar for link to PolyMon project hosted at CodePlex).

Various enhancements have been made including integrated help, bug fixes, updated install scripts for SQL 2000 as well as SQL 2005, visual enhancements, etc.