About Links Archives Search Feed Albums of Note

XML Bugs in .NET 2.0 and The MS Support Call Process : Migration Woes Part 2

Scenario: We upgraded one of the XML servers we run to .NET 2.0 recently and started noticing a problem with its serialization of response messages. We have boiled this problem down to a problem the XML Serializer has with a schema group containing an element with no content model. In our case, it looks like this.

The .NET 1.1 framework serializes a greeting element correctly

Test 2006-05-04T11:01:58.1Z

but although it seemed to be fine initially in .NET 2.0, we started getting this instead.

Test 2006-05-04T10:55:07.9Z

Diagnosis: It turns out that the new XmlSerializer code in .NET 2.0 has a bug in it when it deals with empty elements in a group. In .NET 2.0 if struct/class member can have multiple types (multiple XmlElementAttribute in CLR, choice complexType in XSD) .NET 2.0 does not serialize it according to derivation hierarchy which causes the wrong xml output above. When the code is generated for the temporary dll which performs the actual serialization, the order in types are checked to choose how to serialize the member is arbitrary so the error may or may not be reproduced.

This is now logged with Microsoft in their bug database here [http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=b071422f-9226-43e9-a99d-560d538b76d6] and is still awaiting resolution.

For us at least, the hard part was replicating the bug with, as it turned out, ‘arbitrary behaviour’. Indeed, on a clean machine in .NET 2.0 the bug seemed not to occur unless you kickstarted it. A bit of clarification on this. We created a simple command line app that demonstrated the bug. It’s linked to in the MS Bug Report [http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=b071422f-9226-43e9-a99d-560d538b76d6] if you’re interested. The problem was that

But hang on, we now have three apps with identical code exhibiting different behaviours, the only difference being that one was built and run before the bug was kickstarted. Even if we ran Proj1 again after Proj3 there were still no signs of the bug. Now Microsoft note that the behaviour of the bug itself is arbitrary, but there seems to be a pretty definite on switch. Where’s the off switch I wonder?

While I’m waiting for MS to get back to me with a fix, I’ve been looking at workarounds. Two spring to mind:

Both make sense except that in this case, the former means changing a schema which is laid out in an RFC  de-facto standard which I can’’t do, and the latter (as far as I am aware) means altering code which was automatically generated by xsd.exe so should there be a need to regenerate this code again (an extension to the schema perhaps) there also need to be several warnings and explanations on how to re-edit the new code so that it serializes correctly again. Neither are great. Ah well

Comments: It’s ironic that the reason we moved the code affected by this problem to .NET 2.0 was a different bug in .NET 1.1 SP1 involving generated classes from schemas spread across different XSD documents. We avoided that by not installing SP1 on our Win2k boxes. As we upgraded to Wn2K3, it became apparent that the version of .NET1.1 installed by default with the OS included the bug we had previously avoided. Now we’re hit by another one in .NET 2.0. It can be worked around, but you can appreciate the irony.

This is the first time I’ve ever used one of the Microsoft Support Calls that come with my MSDN subscription and aside from an email going astray initially, you’ve got to hand it to the MS support staff. They’ve been pretty responsive thus far with diagnosis. Of course, I’ve got to wait now for one of the actual XML team to create a quick fix that I can test, but the whole process was explained nicely. For reference, it works like this.

Before we pushed it out to Microsoft, we struggled for a few days to isolate this bug as it was partially hidden inside another migration issue (see Migration Woes part 3 for more on that) but once we realized there were two separate issues, it was interesting to learn that the temporary dll that serializes classes to XML now identifies System.Object differently between .NET 1.1 and 2.0. In .NET 1.1, the hello class is described as hello.System.Object… In .NET 2.0, it’s helloZSystem.Object, mscorlib, Version=, Culture=neutral, PublicKeyToken=b77a5c561934e089..

There’s not much documentation on how the XML Serializer works (or doesn’t) that I could find, but Kirk Allen Evans [http://xmladvice.com/blogs/kaevans/archive/2004/02/11/5934.aspx], Christoph Schittko [http://weblogs.asp.net/cschittko/articles/33045.aspx], and Scott Hanselman [http://www.hanselman.com/blog/HOWTODebugIntoANETXmlSerializerGeneratedAssembly.aspx] all had useful posts on ways to approach the problem before we concluded that it was an actual .NET bug. Worth reading for future reference if you’re interested.

Posted on May 24, 2006   #Geek Stuff  

← Next post    ·    Previous post →