One issue that comes back every so often with Edgesight is one of performance. Little by little, an administrator might see the following behaviour:

- Report Rendering takes forever,
- Payloads are queuing in the webload directory and are imported very slowly,
- Servers will not update the Database,
- Nightly maintenance takes a very long time – and possibly eventually fails,
- Grooming error message are showing up all over the place in the console,
- and other similarly annoying stuff…

Those symptoms are usually caused by too much data residing in the SQL database. There is so much to process that every single job takes ages. Tasks that should be performed by the SQL engine build up until the system comes to a halt – if there’s no disk space left – or becomes desesperately sluggish and downright unusable.

This is where the administrator realizes that maybe the SQL server was not up to its mission. There might be too many servers reporting and/or the SQL server is under heavy load. The administrator might have chosen to expand the retention period, etc. However, the administrator is in a situation where reports need to be run and sent to management illico. He needs to find a way out of the predicament today so that he can send those reports to his management tomorrow morning. He’ll think about correcting the deployment design later.

Finding a way out of this situation in a short amout of time is possible. My method is to find out where the bulk of the data is and then drill down to identify which type of data we can manually remove, either by running a delete statement or truncate some table if their content is not needed.


1 – Find out which filegroups are the largest.

The default installation of Edgesight will create 7 Filegroups but the administrator may decide to customize the installation. Regardless of the installation management, let’s first find out the size of the filegroups:

SELECT DB_NAME(database_id) AS DatabaseName, Name AS Logical_Name, Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = ‘Edgesight_DB_Name’
GO

Depending on the size of the deployment, the result will vary. But what we will want to tackle are the largest filegroups. I have seen filegroups exceeding 100 GB.

 

2 – Find out the table to filegroup mapping:

 

select ‘table_name’=object_name(i.id),
i.indid,
‘index_name’=i.name,
i.groupid,
‘filegroup’=f.name,
‘file_name’=d.physical_name,
‘dataspace’=s.name
from  sys.sysindexes i,
sys.filegroups f,
sys.database_files d,
sys.data_spaces s
where objectproperty(i.id,’IsUserTable’) = 1
and f.data_space_id = i.groupid
and f.data_space_id = d.data_space_id
and f.data_space_id = s.data_space_id
order by f.name,object_name(i.id),groupid
go

 

The output of the query will help tell you which tables belong to which filegroup. So let’s say if you find out in step 1 that FG2 is oversized, step 2 will tell you that the ALERT and ALERT_PARAM tables needs to be looked at. Similarly, if FG6 is oversized, the CORE_NET_TRANS and CORE_NET_STAT tables needs investigation.
You can also directly find out the “count per table” and investigate the largest tables directly:

 

SELECT     sysobjects.Name,     sysindexes.Rows
FROM     sysobjects
INNER JOIN sysindexes ON sysobjects.id = sysindexes.id
WHERE     type = ‘U’     AND sysindexes.IndId < 2
order by rows desc

3 – Find out about the grooming schedule configured by the admin You can find out about this in the Edgesight console but I believe it is more convenient to run the following query:

 

select f.table_name,
       f.view_name,
       m.default_days,
       m.groom_days
from   maint_table_config m
       inner join table_def f on f.tableid=m.tableid
where m.default_days is not NULL

 

Example output:

 

 

The output will also provide the default configuration so that you can see straight away what was changed and whether or not the large tables are affected by these changes.


4 – Now that you know about the largest Filegroups and the largest tables
, you can start drilling down and find out what you can remove. This part is a little more complex because it depends on the content of the affected tables. So Let’s say, the count per table (Step 3 above) returns the following (First 20 tables only):

The table above tells us which tables we should try to reduce. First thing we need to know we whether records older that the cut-off date are still present (which means that the grooming was not successful). To find out, compare the oldest record in the table with the grooming schedule in step3.

Let’s take the USAGE table as an example, since it comes on top. Checking the grooming schedule for the usage table (from step3), we see that the retention period is 90 days:

 

The grooming for this table is done on the dtperiod field. So what we can do is find the oldest record and check whether or not it’s older than the cut-off date (today -90 days):
select MIN(dtperiod) from usage

Result:

2/17/2012 18:00
So if today is August 22, 2012 and the oldest record dates back from February 17, 2012, it defintely tell us that the grooming did not work. You can determine how many records are older than the cut-off date by running this:

 

select COUNT(*) from usage
where dtperiod < ’2012-05-22 18:00:00′

Result:

35055214

 

This tells us that the grooming procedure failed to remove 35M records out of 49M. Well we’re going to have to remove them manually.
Since every single DML statements takes a long time to run, we cannot afford to simply run a delete on all records that are older than the cut-off date. Instead, we want to delete those records by slices of say 100,000 records. We would run this:

 

declare @cnt int;
declare @date datetime;
set @date = ’2012-05-22 10:03:24.000′
if
exists(select instid from usage where dtperiod < @date)
set @cnt = 1
else set @cnt = 0
while @cnt > 0
Begin
delete top(100000) from usage where dtperiod < @date
if
exists(select instid from usage where dtperiod < @date)
set @cnt = 1
else set @cnt = 0
end;

go

 

The same logic can be applied to the CORE_NET_TRANS table, and by extension to all the tables with a dtperiod field. Once the largest tables have been groomed using this method, we will be able to reclaim free disk space for the relevant FileGroups and normal operations should resume.