Weekly Access problems to a SharePoint site

Just recently a client of mine raised an issue regarding an access issue that seemed to raise it’s head about every 8 days or so. What struck me was the distinct lack of errors issues by the system.

Approximately every 8 days, users will go to use the site and get a failed to connect to site error. IISReset and AppPool recycles don’t resolve the problem, and only a reboot of the Web Front End resolves the issue.

IIS Logs just show connections stopping at a fixed time and the SharePoint logs show nothing other than the usual trace log messages around timer jobs and index crawls.

It wasn’t until I examined the HTTPERR log file (Located normally in C:\windows\system32\logfiles\HTTPERR) that I realised what was likely to be happening as it was showing Connection Refused.

2009-10-02 10:59:07 – – – – – – – – – 2_Connections_Refused –
2009-10-02 10:59:13 – – – – – – – – – 1_Connections_Refused –
2009-10-02 10:59:17 – – – – – – – – – 2_Connections_Refused –

Basically, all incoming connections to a web server are handled by the HTTP.SYS portion of the system before being handed out first to IIS and then to the respective handler for applications such as SharePoint. Hence the distinct lack of error logs in IIS and SharePoint.

This system process utilizes Non Paged Pool memory extensively, however it has a little known security feature that starts to refuse connections once there is only 20Mb remaining in the NPP.

At this point it’s important to remember that HTTP.SYS is not the cause of your problem, merely a symptom of a system with potentially a few things wrong with it. So how do we trouble shoot this little problem on a SharePoint WFE.

So the first place to check is how much NPP memory you have available to you. A Windows 2003 32 Bit server should have 256Mb of NPP available at boot. However if for some reason you have the /3GB switch specified in your boot.ini, this NPP allocation is halved to 128Mb. In addition, the /3Gb option is NOT SUPPORTED for SharePoint and must be removed. (KB933560)

So at this point we’ve doubled the NPP available to us, but this may not have solved the problem as something else is in process taking that memory and possibly leaking it away by not returning it properly to the pool. In these instances you need an application called PoolMon.exe from the Windows Server 2003 resource kit.

Using this tool it is possible to identify the driver or system file that is not returning memory properly to the NPP, which in my case was down to an old Broadcom Ethernet driver for the Dell PowerEdge server this system was running on.

I won’t re-create the posts on how to use the Poolmon tool as there are some good ones out there, but the MS articles that show you how to use the PoolMon.exe program are below.

KB177415 – How to use Memory Pool Monitor to troubleshoot Kernal mode memory leaks.

On a side note, if you have Windows 2003 SP2, then you may need to think about disabling the TCP Chimney as per KB945977

Hope this helps you some way to resolving a similar issue.

Paul.

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.