torsdag 14. november 2013

NServiceBus fails to connect to RavenDB during high load

 
Recently I experienced a serious issue with NServiceBus and RavenDB where the NServiceBus endpoints no longer were able to connect to RavenDB. The following error message was written to the Windows event log:
System.Net.Sockets.SocketException (0x80004005): An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
It turned out that this is a known issue with using RavenDB. The solution is to enable the HttpWebRequest.UnsafeAuthenticatedConnectionSharing setting on the RavenDB client connection. Be aware of the security implications related to using this setting, as described in the MSDN documentation.
A configuration setting named EnableRavenRequestsWithUnsafeAuthenticatedConnectionSharingAndPreAuthenticate which could be set in order to avoid the issue was added to the configration API in version 4.0.0. However, the NServiceBus configuration API for RavenDB has later been rewritten, and there is currently no documentation available on how to enable this setting using the new API.

Follow these two steps to enable the setting:

Step 1:
Add a reference to RavenDB.Client for your NServiceBus project. You should use the same version as is referenced by your NServiceBus version.
Step 2:
Use the CustomiseRavenPersistence method on the configuration API to register a callback which can be used to configure the RavenDB client connection:
    public class NServiceBusConfigurator : IConfigureThisEndpoint, AsA_Server, IWantCustomInitialization
    {
        public void Init()
        {
            Configure.Instance.CustomiseRavenPersistence(ConfigureRavenStore);
        }

        private static void ConfigureRavenStore(Raven.Client.IDocumentStore store)
        {
            store.JsonRequestFactory.ConfigureRequest += (sender, args) =>
            {
                var httpWebRequest = ((HttpWebRequest) args.Request);
                httpWebRequest.UnsafeAuthenticatedConnectionSharing = true;
                httpWebRequest.PreAuthenticate = true;
            };
        }
    }

torsdag 3. januar 2013

Wireshark to the rescue

Wireshark is a free and open source network protocol analyzer which can be really useful when analyzing a wide range of network related issues.

Recently it turned out to be a real life saver on the project I currently work on. A web service client application which had been developed by a consultant company in India was going to call a web service hosted on a test server by my company in Oslo. But no matter how much the developers working in India tried, it would not work and they would always get the following exception:

System.Net.WebException was caught
  Message="The underlying connection was closed: The connection was closed unexpectedly."
  Source="System.Web.Services"
  StackTrace:
       at System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(WebRequest request)
       at System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(WebRequest request)
       at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)

However, everything worked fine when we tested our service from external and internal networks in Oslo.

When tracing the incoming request from India in Wireshark we could see the following:

IncomingTraffic

The request reached our server, but we were unable to send the “100 Continue” response back to the client. It was possible to reach our web server through a browser on the client machine, so there should be no firewalls blocking the communication. It seemed like the connection had been closed by the client.

Next we got the developers in India to try the same request in SoapUI, and then it worked! This made us think that the problem was in the client application and not at the infrastructure level. So we spent several hours trying to troubleshoot the client environment, without any success. Google gave us numerous reports (1, 2, 3) of other people experiencing the same issues, but the suggested solution neither didn’t work nor did they explain the exact reason for the problem. Most of the suggestions involved excluding KeepAlive from the HTTP header and to use HTTP version 1.0 instead of version 1.1.

The next step was to log the request by using Fiddler Web Debugger on the calling server in India and then try to replay the request. The first replay of the request failed, as expected:

HTTP/1.1 504 Fiddler - Receive Failure
Content-Type: text/html; charset=UTF-8
Connection: close
Timestamp: 22:17:14.207

[Fiddler] ReadResponse() failed: The server did not return a response for this request.     

So there was no reply from our server. Next we tried to remove the HTTP KeepAlive header as suggested by some of the blog posts we found on Google, and then resubmitting the request in Fiddler:

FiddlerHeader

And now the request worked in Fiddler! Once the TCP connection was established, we could even replay the original request which failed, and it would work.

But why did this work?

Based on the test results in Fiddler we arrived at the conclusion that the problem was not in the client application, but rather at the infrastructure level.

So we installed Wireshark on the calling server and did some more tracing. Finally we could see what was causing us problems:

WireSharkFragmentationNeeded

A router is telling us that the size of our IP datagram is too big, and that it needs to be fragmented. This is communicated back to the calling server by the ICMP message shown in the picture above.

By inspecting the ICMP message in Wireshark we can find some more details:

WireSharkIPDetails

There are several interesting things to observe in the picture above:

  1. The problem occurs when the router with IP address 209.58.105.21 tries to forward the datagram to the next hop (this is a backbone router located in Mumbai)
  2. The router in the next hop accepts a datagram size of 1496 bytes, while we are sending 1500 bytes.
  3. The router at 209.58.105.21 sends an ICMP message back to the caller which says that fragmentation of the datagram is needed

By executing the “tracert” command on the remote server we could get some more information about where on the route the problem occurred:

[…]
  3    26 ms    26 ms    26 ms  203.200.137.9.ill-chn.static.vsnl.net.in [203.200.137.9]
  4    31 ms    31 ms    31 ms  59.165.191.41.man-static.vsnl.net.in [59.165.191.41]
  5    66 ms    66 ms    66 ms  121.240.226.26.static-mumbai.vsnl.net.in [121.240.226.26]
  6    70 ms    70 ms    70 ms  if-14-0-0-101.core1.MLV-Mumbai.as6453.net [209.58.105.21]
  7   184 ms   172 ms   171 ms  if-11-3-2-0.tcore1.MLV-Mumbai.as6453.net [180.87.38.10]
  8   174 ms   173 ms   194 ms  if-9-5.tcore1.WYN-Marseille.as6453.net [80.231.217.17]
  9   175 ms   176 ms   175 ms  if-8-1600.tcore1.PYE-Paris.as6453.net [80.231.217.6]
10   191 ms   176 ms   229 ms  80.231.154.86
11   174 ms   174 ms   213 ms  prs-bb2-link.telia.net [213.155.131.10]
[…]

Conclusions

A white paper is available at Cisco which describes the behaviour which we could observe above. The router which requested fragmentation of the datagram did not do anything wrong, it just acted according to the protocol standards. The problem was that the OS and/or network drivers on the calling server did not act on the ICMP message and did not try to either use IP fragmentation or to reduce the MTU size to a lower value which wouldn’t require fragmentation.

According to the Cisco white paper it is a common problem that the ICMP message will be blocked by firewalls, but that was not the case for our scenario.

And what about the request we could get working in Fiddler by removing “Connection: Keep-Alive” from the header? It worked because the datagram would become small enough to not require fragmentation (<= 1496 bytes) when we removed this header.

Resources

Wireshark homepage: http://www.wireshark.org/

Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC: http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml

fredag 31. august 2012

Presenting Luup's Payment Platform at Integrasjonsdagene 2012

The conference Integrasjonsdagene 2012 took place in Halden 30th and 31st of August. I delivered the section of the presentation which is titled “Our Experiences and Challenges with BizTalk”.

torsdag 29. september 2011

First experiences with using WinRM/WinRS for remote deployment

What is WinRM/WinRS?

Windows Remote Management (WinRM) is a remote management service which was first released with Windows 2003 R2.

WinRM is a server component, while Windows Remote Shell (WinRS) is a client which can be used for executing programs remotely on computers which run WinRM.

The following example shows how to remotely list the contents of the C:\ folder on a computer with host name Server01:

WinRS –r:Server01 dir c:\

Using WinRM for remote deployment

My first encounter with WinRM/WinRS was to execute some PowerShell scripts for automatic remote deployment of a test environment. The commands were executed from an MSBuild script in a CruiseControl.Net build.

The scripts would first uninstall any old versions of the components, and then renew databases and install new component versions. Finally a set of NUnit tests would be executed on the environment.

WinRS failing to execute remote commands due to limited quotas

It was very easy to get started with WinRS, and in the beginning everything seemed to work fine. But now and then the execution failed with System.OutOfMemoryException or with the message “Process is terminated due to StackOverflowException.”.

The reason for these problems was not obvious since there was no mention of quotas in the error messages, but after some investigation it turned out that they were caused by a too low memory quota on the server. The default memory quota is 150 MB, and can be changed by executing the following command on the remote server (will set memory quota to 1 GB):

WinRM set winrm/config/Winrs @{MaxMemoryPerShellMB = "1000"}

Multi-Hop configuration

In one of my scripts i tried use a UNC path to access a remote share from the target computer, but got “Access is denied”. It turned out that the Credential Security Service Provider (CredSSP)  had to be configured on the client and on the server in order to achieve this: http://msdn.microsoft.com/en-us/library/windows/desktop/ee309365(v=VS.85).aspx

Resources

Configuring WinRM

Quota Management for Remote Shells

torsdag 4. august 2011

Using Gendarme with CruiseControl.Net for code analysis

Gendarme is being developed as a part of the Mono project and is a tool for code analysis. It comes with a wide range of predefined rules and can easily be extended with you own custom rules which you can write in C# or other .Net languages.

Configuring the CruiseControl.Net buidl task

CruiseControl.Net has been delivered with the Gendarme task since version 1.4.3. However, the Gendarme executable must be downloaded and installed separately. The binary can be downloaded from this link: https://github.com/spouliot/gendarme/downloads


Gendarme is designed for processing the build output assemblies in ONE directory. I.e. it does not support recursive search for assemblies, which fits well if you have one CruiseControl.Net build project per service/application, but in my case I wanted to generate a report for an entire product branch with multiple services and applications.

This can be achieved by using the <assemblyListFile> configuration element, which lets you specify a file that contains the full path to each assembly which should be analysed.
In order to generate the file, I execute the following PowerShell command:

Get-ChildItem -Path 'D:\SomeDir\Work' -Recurse -Include MyCompany*.dll -Exclude *.Test*.dll,*Generated.dll | sort -Property Name -Unique | sort -Property FullName | foreach {$_.FullName} | Out-File -FilePath 'D:\SomeDir\Artifact\AssembliesForCodeAnalysis.txt' -Width 255
The PowerShell command above will recursively scan through the directory “D:\SomeDir\Work” and include all DLL files starting with “MyCompany” excluding those which ends with “.Test.dll” or “Generated.dll”. Next it will select distinct files regardless of paths (in order to filter out shared assemblies which are duplicated), before it sorts by full path name and write the output to file.

Using the PowerShell command as an executable step, the project configuration in ccnet.config turns into this:

   1:        </msbuild>
   2:        <exec>
   3:          <executable>powershell</executable>
   4:          <buildArgs>-Command "Get-ChildItem -Path 'D:\SomeDir\Work' -Recurse -Include MyCompany*.dll -Exclude *.Test*.dll,*Generated.dll | sort -Property Name -Unique | sort -Property FullName | foreach {$_.FullName} | Out-File -FilePath 'D:\SomeDir\Artifact\AssembliesForCodeAnalysis.txt' -Width 255"</buildArgs>
   5:        </exec>
   6:        <gendarme>
   7:            <executable>C:\Program Files (x86)\Gendarme\gendarme.exe</executable>
   8:            <assemblyListFile>D:\SomeDir\Artifact\AssembliesForCodeAnalysis.txt</assemblyListFile>
   9:            <baseDirectory>D:\SomeDir\Work</baseDirectory>
  10:            <limit>2000</limit>
  11:            <severity>medium+</severity>
  12:            <confidence>high</confidence>
  13:            <quiet>FALSE</quiet>
  14:            <verbose>TRUE</verbose>
  15:            <failBuildOnFoundDefects>FALSE</failBuildOnFoundDefects>
  16:            <verifyTimeoutSeconds>600</verifyTimeoutSeconds>
  17:        </gendarme>          
  18:      </tasks>
  19:      <publishers> 
  20:        <merge> 
  21:          <files> 
  22:            <file>D:\SomeDir\Artifact\test-results\*.xml</file> 
  23:            <file>D:\SomeDir\Artifact\gendarme-results.xml</file> 
  24:          </files> 
  25:        </merge> 
  26:        <statistics />

Configuring the Dashboard

The stylesheets which are needed for showing the formatted reports in the CruiseControl.Net dasboard are included with the CruiseControl.Net installation, and just need to be referenced in dasboard.config:

   1:       <buildPlugins> 
   2:        <buildReportBuildPlugin> 
   3:          <xslFileNames> 
   4:            <xslFile>xsl\gendarme-summary-ccnet.xsl</xslFile> 
   5:         </xslFileNames> 
   6:         <xslReportBuildPlugin description="Gendarme Report" actionName="GendarmeBuildReport" xslFileName="xsl\gendarme-report-ccnet.xsl"/>
   7:        </buildReportBuildPlugin> 

Resources

Gendarme home page: http://www.mono-project.com/Gendarme

Gendarme CCNet task configuration: http://confluence.public.thoughtworks.org/display/CCNET/Gendarme+Task

søndag 19. desember 2010

Intellisense for CruiseControl.Net configuration files

Editing the CruiseControl .Net configuration file ccnet.config may be a cumbersome process. The XML configuration elements are documented at http://ccnetlive.thoughtworks.com/ccnet/doc/CCNET/Configuring%20the%20Server.html, but it would be more convenient to have intellisense available when editing the configuration file.

Intellisense for CCNet configuration files can be added to Visual Studio by using the schema definition file ccnet.xsd. Unfortunately this file is not distributed by the CCNet installation package, but it is included in the source distribution. For the current version the file is located at “\project\ccnet.xsd” in the downloadable source distribution zip file.

Your can also get it from the the source code repository at SourceForge (link is to version 1.5).

Adding the XSD schema to Visual Studio

Once you have gotten your hands on the ccned.xsd file, it must be copied to the schema folder of your Visual Studio installation, e.g. to  “C:\Program Files (x86)\Microsoft Visual Studio 10.0\Xml\Schemas\”.

Note: Copying the file to the folder “Microsoft Visual Studio 10.0\Common7\Packages\schemas\xml” will not have any effect!

Configuring the namespace

Which namespace should be used for the CCNet configuration files? A namespace must be specified in order for Visual Studio to know which schema to use for intellisense.

CCnet.xsd defines the target namespace "http://thoughtworks.org/ccnet/1/5":

<?xml version="1.0" encoding="utf-8"?>
<
xs:schema targetNamespace="http://thoughtworks.org/ccnet/1/5"
    elementFormDefault="qualified"
    xmlns="http://thoughtworks.org/ccnet/1/5"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <
xs:element name="cruisecontrol">

… which means that the following namespace must be defined in the CCNet configuration files:

<cruisecontrol xmlns="http://thoughtworks.org/ccnet/1/5">
  <
project name="foo">
    <
webURL>http://localhost/ccnet</webURL>
    <
modificationDelaySeconds>10</modificationDelaySeconds>

The schema file seems to favor using XML elements instead of attributes for many configuration options, which contradicts many of the example configurations which are distributed with CCNet, but I don’t consider this as being a big issue.

onsdag 26. mai 2010

Presentation: “Introduksjon til NServiceBus”

On 25th of May I presented NServiceBus at NNUG Oslo. The presentation is now available at Slideshare (in Norwegian):