Monday, October 5, 2015

OM12 DW Database Error. EventID 31553: ‘Arithmetic overflow error converting IDENTITY to data type int’

An OM12 environment had some MPs updated, including the core MPs based on UR#9 AFTER the required SQL scripts were applied. So far so good.

Out of the blue, the OpsMgr event log on the MS servers started to log EventID 31533 with the message  ‘…Data was written to the Data Warehouse staging area but processing failed on one of the subsequent operations. Exception 'SqlException': Sql execution failed. Error 8115, Level 16, State 1, Procedure ManagedEntityChange, Line 237, Message: Arithmetic overflow error converting IDENTITY to data type int…’

I’ve seen a lot 31533 events but never before I saw the message Arithmetic overflow error converting IDENTITY to data type int.

So I reached out to some people I know. And gladly I got a response.

As it turns out, the  related table ran out of ID’s. This can be checked by running this query against the Data Warehouse database: DBCC CHECKIDENT ("Managedentity"). The value should be lower than 2147483647.

Before I start: Microsoft PFE recommends to open a case with Microsoft Customer Support Services, so think TWICE before running this fix.

Be sure you know what you do. When possible, do this with a SQL DBA so you have additional SQL knowledge and experience available. Also know that fixing this issue for a certain table can’t be enough and that other tables will suffer from the issue as well. In cases like these it’s better to open a case with Microsoft Customer Support Services.

Reseed the related table when the value is 2147483647 or higher.

  1. First and foremost, BACKUP both SCOM databases, and ascertain yourself the backups are in working order (aka: they can be 100% restored). Also, BEFORE running the backup, STOP all SCOM related services on ALL SCOM Management Servers so new data will be processed by the SCOM databases. Enable these services AFTER this procedure has been run;
  2. Find the current number of rows in the table. This number will be used in Step 3 as value n: select count(*) from managedentity;
  3. Reseed the identity of this table with this query: DBCC CHECKIDENT ("Managedentity",RESEED, n+1). Again: n is the value found in Step 2. Also: there should be no spaces between the , and RESEED;
  4. Check and confirm that the value was changed: DBCC CHECKIDENT ("Managedentity");
  5. Enable the SCOM related services on the Management Servers;
  6. Confirm that EventID 31553 is gone.

Changes are however, other tables will show up in the OpsMgr event log with the same error. Run Steps 2 to 6 in order to remedy it. And again STOP the SCOM related services on the Management Servers before going through these steps and enable them afterwards.

Thursday, September 17, 2015

Updated MPs: WS2012 DHCP & ADS

A few days ago Microsoft released updates for these two MPs:

  • Windows Server 2012 DHCP
    • Version: 6.0.7295.0
    • Change: ‘…The properties view of Failover Server Relationship did not display all the IP addresses, with this fix the properties view of Failover Server Relationship will display all IP addresses…’
    • Download location:

  • Windows Server 2012 Active Directory Domain Services
    • Version: 6.0.8321.0
    • Change: ‘…The “AD_Op_Master_Response.vbs” script in the Active Directory Domain service MP failed on some environments where region for local system is set to a non EN-US locale. This was due to a date field not bring stored in registry in the date format of the region/locale. With this fix, the script doesn't fail when region is to to a non EN-US locale…’
    • Download location:

Like all other MPs: Test them before putting them into production.

New Community MP: Monitor & Reduce Health Service Store Size

At the beginning of this month Jimmy Harper released a new MP for the community. This MP monitors the Health Service Store size and can even reduce it.

On many servers this is not an issue. But sure enough you’ve got a few servers with very limited disk space. So every MB saved is welcome. In occasions like these it certainly pays of to reduce the size of the Health Service Store.

Normally you have to do this by hand. Jimmy Harper has developed a MP which monitors, collects and reduces the size of the Health Service store file.

Want to know more? Go here, read Jimmy’s posting and download the MP for FREE!

Thanks Jimmy for sharing this MP with the community.

Updated MP: OpsMgr Self Maintenance MP Version

Yesterday the king of MP authoring, Tao Yang released an updated version of the OpsMgr Self Maintenance MP, version

And (again) I am VERY impressed. The previous version was already awesome and something which should be PRESENT in ANY SCOM 2012x environment, but this update even got better. Besides some bug fixes it contains new features as well.

Please be advised to READ the guide of this MP from cover to back.
Since good tuning/configuration is required in order to get the most out of it.
Therefore RTFM is key here.

Even though it might take some time it’s worth the effort since it’s like ‘Set & Forget’. So when the tuning/configuration phase is done, this MP will do what the name tells you: Making SCOM 2012x to maintain itself! Just awesome a MP like this comes for FREE!

Thanks Tao for sharing this totally AWESOME MP with the community!

Monday, September 14, 2015

Repost/Cross Post: Dynamic OM12 R2 Groups WITH Heartbeat Alerts

When creating dynamic Groups containing Windows Servers for instance, you WON’T get an Alert when one of those servers is down. Ouch! When you don’t know that, it’s something which is going to bite you sooner or later.

There is always a WHY to something and this case isn’t different. YES, SCOM does check on the availability of the monitored Windows Server. Sure. But it does that kind of monitoring on a whole different Class.

As we all know, SCOM is all about Classes. And the Monitors which check for the availability of the Microsoft Monitoring Agent (MMA) and the related Windows Computer are targeted against the Health Service Watcher Class, and not against the Windows (Server) Computer Class:

Health Service Heartbeat Failure Monitor:

Computer Not Reachable Monitor:

As a result, when creating a dynamic Group in SCOM only containing Windows Server objects, and using this Group in your notification model, you WON’T get an Alert when one of those servers goes down. Sure, other Alerts will do come up, but not an Alert like ‘Health Service Heartbeat Failure’ or ‘Failed to Connect to Computer’, while these Alerts do show you the root cause right away…

What to do about it?
Sure. You can get angry/frustrated with it and go on with your life. However, it’s SCOM remember? So with a bit copying and pasting XML you can achieve a lot!

No! MANUALLY adding the related Health Service Watcher object to this dynamic Group won’t fly. We WANT a dynamic Group remember? A ‘set-and-forget’ Group by knowing this Group will always contain the correct members, AUTOMATICALLY.

So the initial investment will be a bit bigger, but when done, you’re truly done here.

This posting isn’t based on a brain wave I got. But just looking on the internet for the right resources. And after some good searching I found what I needed. At the end of this posting I’ll share the resources I used and give credit to the people who deserve it.

Easy. Follow these few steps by example and you’ll be fine.

  1. Create a Dynamic Group which is dynamically populated with the Windows Computer objects. E.g: Example Group with this Dynamic Inclusion Rule:
    Basically ALL Windows Computers monitored by SCOM are added automatically to this Dynamic Group.
  2. Checking the members:
    As you see, Windows Computer objects only.
  3. Export the MP containing this Group, open the related XML in Notepad++ (for instance) and add this additional XML code between the tags </MembershipRule> and </MembershipRules>. So you’ve got something like this:
    Add the code where the red arrows point at:

    You got something like this:
  4. Increment the version number so you can differentiate between both versions of this MP. Save the modifications and import this updated version into your SCOM MG.
  5. Check the Group members:

So this REALLY works. However, there are some caveats to reckon with:

  1. Since you edited the underlying XML of this Group you CAN’T edit it anymore in the SCOM Console:
    The Create/Edit rules button is greyed out AND do you recognize the Query formula? Exactly, it’s the XML code you just added!
  2. The previously mentioned XML code ONLY works for SCOM 2012 R2! When running OM12 SP1 you’ll need to change the version numbers from 7585010 to 7084300.
  3. The modification of the numbers goes for SCOM 2012 RTM. I don’t know the numbers but you can find them in the XML code of the MP, under the header <References>. Look for the Aliases for Microsoft.Windows.Library and Microsoft.SystemCenter.InstanceGroup.Library. Here you’ll find the correct numbers:
  4. Sometimes your MP will use other references ALIASES. So check them in your MP whether they’re correct. When not, adjust the XML code in accordingly:

Used resources
As stated before, this posting came to be by using other resources, in chronological order:

  1. A posting written by Tim McFadden;
  2. A posting written by Jonathan Almquist.;
  3. TechNet Forum for SCOM, thread 01;
  4. TechNet Forum for SCOM, thread 02 (Thanks Marthijn van Rheenen).

So ALL credits for this posting goes to these guys. Thanks men!

Wednesday, September 9, 2015

One Small Footprint For a Server, One Giant Leap For OMS

Welcome to the new world
Microsoft is reinventing itself. It’s in a huge transition from a company previously focused on ‘devices & services’ to an enterprise geared to the ‘mobile-first, cloud-first’ mantra. Even though Microsoft has brought marketing to a whole new level, in this particular case there isn’t much marketing mumbo jumbo, if none at all.

The investments and speed of development in Microsoft’s cloud offering is unprecedented, all across the ‘Azure board’. New features are added on an almost weekly basis to the whole Azure port folio. Some are kept low key (like the Clutter feature in Office 365) where as others do get a bigger exposure.

Fact is that Azure is an ever evolving cloud environment gaining more traction by the day. Microsoft’s whole workforce has shifted their direction and are working in unison for the development of the cloud.

OMS has the same speed of development
OMS makes no difference here. Quite recently Microsoft introduced a new feature in OMS: Near real-time performance data collection. At a first glance it might seem like a minor step, but – after having tested it thoroughly – it’s a giant leap for OMS.

I’ll tell you why.

NRT & supposed impact
The intervals for  near real-time (NRT) performance data collection by OMS is set by default to 10 seconds. Which makes sense since the name of the new feature implies ‘near real-time’.

Being someone with a SCOM background it made me wonder about the footprint of it all. How about memory and CPU load?. How about network load? In other words, what kind of footprint does OMS with NRT performance data collection has on any given server?

Time to put it to the test.

The test environment
Any test is just as good as the environment used for it, together with the applied test scenario. So I decided to deploy in my own test lab two brand new VMs, identical to each other. Also I deployed a new OMS Workspace in order to ascertain the test wasn’t ‘contaminated’ with old settings I tested in my other OMS workspaces.


  1. 2 identical Windows 2012 R2 VMs (3 GB RAM, 1 vCPU, 1 logical drive C:\, workgroup member), NRT01 and NRT02;
  2. Both VMs placed on the same Hyper-V host, using the same storage, compute and network resources;
  3. One new OMS workspace, named NRTLab.

Item configuration:

  1. Server NRT01 got the Windows Agent, downloadable from the OMS workspace NRTLab (the Windows Agent is the Microsoft Monitoring Agent (MMA) with OMS Workspace connection capabilities);
  2. The Windows Agent on NRT01 connects ONLY to the NRTLab OMS Workspace;
  3. NRTLab isn’t connected to any SCOM 2012 Management Group nor any Azure Storage Accounts:
  4. NRTLab Solutions configuration: Log Search and System Update Assessment:
  5. NRTLab Logs configuration. Log Name: Operations Manager (Error & Warning):
  6. NRTLab NRT Performance Data Collection settings. OMS default with the default sample interval:
  7. NRTLab is happy and reports a 100% complete configuration:
  8. And yes, NRT01 is connected properly to NRTLab and data is coming in:


Now I’ve got enough resources to run a good test. How about a valid test scenario?

Test scenario
Say what? NRT02 has NO Windows Agent? Yes, that’s correct! This server has only ONE purpose: it’s a reference server!

Now I can see what kind of CPU, RAM and network load this server has compared to NRT01 running the Windows Agent reporting to NRTLab while collecting NRT performance data, OpsMgr event log entries (errors & warnings) & checking whether the server is missing out on any crucial updates (performed by the System Update Assessment Solution).

On both servers I defined a new Data Collector Set in Performance Monitor, in order to collect specific performance data:


  • Logical Disk > Current Disk Queue Length (C:);
  • Memory > Available MBytes
  • Network Adapter > Bytes Total/Sec
  • Network Adapter > Current Bandwidth
  • Process > % Processor Time (HealthService.exe & MonitoringHost.exe)
  • Process > IO Data Operations/sec (HealthService.exe & MonitoringHost.exe)
  • Process > Working Set – Private (HealthService.exe & MonitoringHost.exe)
  • Processor Information > % Processor Time


  • Logical Disk > Current Disk Queue Length (C:);
  • Memory > Available MBytes
  • Network Adapter > Bytes Total/Sec
  • Network Adapter > Current Bandwidth
  • Process > % Processor Time (_Total)
  • Process > IO Data Operations/sec (_Total)
  • Process > Working Set – Private (_Total)
  • Processor Information > % Processor Time

I had these Data Collector Sets running for about 24 hours. No programs were opened, all MMC’s were closed (Performance Monitor included!), so these servers were simply running without being used except for their own running processes and services.

I ran these Data Collector Sets multiple times in order to establish a baseline. The results in this posting are based on the last run, from 20:43 9/7/2015 until 21:21 9/8/2015.

The results
And I must say this is the very reason I run the Data Collector Sets multiple times. Simply because the results are very impressive.

Seeing is believing, so let’s take a look at the Report View of the Report of both Data Collector Sets:



As you can see is the memory footprint of the Windows Agent really small. With the counter Process / Working Set – Private we see the number of bytes in use for both components of the Windows Agent, comprised of HealthService.exe (5.2 MB) and MonitoringHost.exe (11.8 MB).

This means that together (the Windows Agent actually) uses 17 MB of RAM! I don’t know about you, but to me that’s really small.

Looking at the CPU footprint you can see it’s small as well. The Windows Agent consumes about 0.151 % Processor Time (% Processor Time NRT01 – % Processor Time NRT02).

When looking at process level, we see that HealthService.exe consumes 0.014 % Processor Time and MonitoringHost.exe 0.034. Together even less than 0.05 (0.048)!

And the load on the network (Bytes Total/sec) is also very low: 413.469 Bytes Total/sec (0.00039 Megabyte!) for the Windows Agent Bytes Total/sec NRT01 – Bytes Total/sec NRT02).

But how about the network load for NRT Performance data collection only? The OpsMgr Engineering Team states: ‘… for a particular computer, a given counter instance (e.g., Processor(_Total)\% Processor Time) with 10 second sample interval will send ~1MB per day (~1MB/day/counter instance)…’.

I contacted Microsoft about this and they told me this is UNCOMPRESSED data! Since it get’s compressed these values are even lower! And they assured me this is thoroughly tested and triple checked.

I am amazed! Never ever I expected to see such a SMALL footprint of the OMS Agent (AKA Windows Agent) on any given monitored server.

Since OMS uses a cloud based state of the art back end for data processing it doesn’t have the potential bottle necks we may see with on-prem SCOM installations. So data comes in, is processed very fast and shown in your OMS workspace in the matter of seconds. Now that’s NEAR REAL-TIME!!!

Since the footprint of OMS is so small I see no reason NOT to use OMS on any important server. Connect the Windows Agent with an on-prem SCOM environment and you’ve got the best of both worlds: on-prem SCOM and state of the art (and ever evolving) OMS in the Cloud!

Check it your self
Both Performance Monitor Reports used for this posting can be downloaded from my OneDrive and opened in Performance Monitor, so you can see it for yourself: NRT01 and NRT02.

But even better, start using OMS today and see what it can do for your environment.

Monday, September 7, 2015

Comparing SCOM And OMS = Comparing Apples And Oranges

Okay. Running a blog is something I really like. But with it do come certain responsibilities. Like keeping the blog clean of anything based on assumptions and lacking good investigation.

Until recently I succeeded in this approach. However, last week I posted an article which fell below that standard. This posting was about the newest feature in OMS, near real-time performance data collection.

In this posting I assumed this kind of near real-time performance data collection would have a noticeable impact on the performance of the monitored servers. Also I compared it to some performance collection Rules present in the Windows Server OS MP, used by SCOM.

As it turned out I was wrong on both accounts. Both assumptions were based on my SCOM experiences. However, as it turns out OMS is a whole different kind of beast (no pun intended!), even though it runs a Microsoft Monitoring Agent (MMA) and uses Intelligence Packs. So the look & feel might be a bit like SCOM but under the covers it works totally different compared to an on-prem SCOM solution.

I want to say sorry to all the readers of this blog, Microsoft included. Simply because you expect here to find information, based on facts and not on assumptions. This particular posting failed on that account.

So I’ve pulled the old posting and will replace it soon by a new one, all about the footprint of the OMS Agent on a server, collecting near real-time performance data using the default interval of 10 seconds. This posting won’t be based on assumptions but on some serious testing.

During the week-end I had more time to put things to the test. This way I’ve found out that OMS has a significant smaller footprint on the monitored servers than I previously assumed.

Spoiler Alert
In the week-end I rolled out in my own lab two identical servers (NTR01 and NTR02), both running Windows Server 2012 R2. Same disk, CPU and RAM configuration.

In OMS I created a new Workspace (NRTLab), especially for this test. From this new OMS Workspace I downloaded the Microsoft Monitoring Agent (MMA) and installed it ONLY on the NRT01 server. The NRT02 server is purely a reference server. It has NO MMA what so ever.

In the new OMS Workspace I configured ONLY the collection of the OpsMgr event logs (error and warnings), the default set of performance counters WITH their default sample interval of 10 seconds and last (but not least), the System Update Assessment Solution.

On both servers I defined a new Data Collector Set in Performance Monitor, all aimed at collecting specific performance data (CPU, memory, NIC and process related items) in order to get a better and detailed understanding of the footprint of the OMS MMA in general, and the collection of real time performance data specifically.

And I must say that I am really IMPRESSED about how small that footprint is. About an hour ago I restarted the Data Collectors on both servers for the last time so I’ve got multiple test results to ‘read’ and translate into a new blog posting.

So stay tuned!