Little known issues with famous Cloud Monitors

Image credit:

As we saw in a previous blog [1] titled What Amazon is not telling you about AWS?, there are bumps on Amazon’s Cloudy highway. You can use one of the many monitoring solutions to warn you when things go wrong, but most will tell you after a bump has already hit you. Our experiments found that Amazon’s own Cloud Watch has poor diagnostics and reported everything as a CPU problem.

One of the long standing, and a great free monitoring products on the market is Nagios. It comes with a lot of features and usage complexity, which will need experienced IT professionals to install and manage [2]. Time and effort to manage this software will distract you from business at hand of using cloud in a simple way. In our limited testing, Nagios failed to detect real performance issues. Other users have reported 45 seconds of Nagios latency from the scheduled check times [3].

Another recent solution on the market is New Relic. While they have enjoyed immense market success, not all users are happy as sites were reported as fast while users experienced slowness [4]. This happens when incorrect metrics are monitored.

Other problem is due to famous Heisenberg Uncertainty Principle, in which one can never simultaneously know the exact position and the exact speed of a particle. Same applies to measuring a server’s performance also, e.g., any measurement will tend to cost some CPU time and memory, thereby affecting the measurement. Users have widely reported such problems[5], so is there any silver bullet monitoring tool?

Stay tuned and we will reveal it in the next Blog..