Xml deserialization error with invalid character
Recently I had a strange error, I serialized a complex object and during the deserialization process I got :
"Error : System.Xml.XmlException: '.', hexadecimal value 0x00, is an invalid character. Line X, position Y."
It appears that we had a "\0" inside a string, something like "Hello\0World" ! and during a classic serialization, the character was encoded in "�"
With the help of Google I have found this post (2007) with a way to have an happy deserialization
You can use XmlTextReader instead of XmlReader, but with more research I have found that you can still use XmlReader with XmlReaderSettings and CheckCharacters set to false.
Here is an example :
public class MyObject
{
public string MyString { get; set; }
}
class Program
{
static void Main(string[] args)
{
XmlSerializerFactory fact = new XmlSerializerFactory();
XmlSerializer ser = fact.CreateSerializer(typeof(MyObject));
MyObject obj0 = new MyObject();
obj0.MyString = "Hello\0World";
// Serialize the object
StringWriter sw = new StringWriter();
ser.Serialize(sw, obj0);
string xml = sw.ToString();
// We can check that in the xml a \0 is transformed in �
Console.WriteLine(xml);
// Classic use of XmlReader.Create
StringReader sr1 = new StringReader(xml);
XmlReader xr1 = XmlTextReader.Create(sr1); // xr1's type is XmlTextReaderImpl
try
{
MyObject obj1 = (MyObject)ser.Deserialize(xr1);
Console.WriteLine("XmlReader [CheckCharacters({0})] : Success : {1}", xr1.Settings.CheckCharacters, obj1.MyString);
Console.WriteLine(obj1.MyString);
}
catch (Exception e)
{
Console.WriteLine("XmlReader [CheckCharacters({0})] : Error : {1}", xr1.Settings.CheckCharacters, e.InnerException);
}
// Using an XmlTextReader
StringReader sr2 = new StringReader(xml);
XmlTextReader xr2 = new XmlTextReader(sr2);
// xr2.Settings is null
MyObject obj2 = (MyObject)ser.Deserialize(xr2);
Console.WriteLine("XmlTextReader : Success : {0}", obj2.MyString);
// Using XmlReader with the good XmlReaderSettings
StringReader sr3 = new StringReader(xml);
XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = false; // default value is true;
XmlReader xr3 = XmlTextReader.Create(sr3, settings); // xr3.Settings.CheckCharacters is a read only and xr3's type is XmlTextReaderImpl
MyObject obj3 = (MyObject)ser.Deserialize(xr3);
Console.WriteLine("XmlReader [CheckCharacters({0})] : Success : {1}", xr3.Settings.CheckCharacters, obj3.MyString);
}
}
XmlWriter and UTF-8 encoding without signature
I used this code to serialize some objects in Xml :
XmlWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
But the output contains an UTF header, the Byte Order Mark (BOM). The use of the header/signature is usually for xml file, if you want to use the ouput in an HttpResponse, you don't need the signature. (some parser can cause a parsing error in java, like org.xml.sax.SAXException).
Here is the change to remove the BOM :
XmlWriter writer = new XmlTextWriter(stream, new UTF8Encoding(false));