Whaly's World tail -f /var/log/whaly

18Aug/110

Xml deserialization error with invalid character

Recently I had a strange error, I serialized a complex object and during the deserialization process I got :
"Error : System.Xml.XmlException: '.', hexadecimal value 0x00, is an invalid character. Line X, position Y."
It appears that we had a "\0" inside a string, something like "Hello\0World" ! and during a classic serialization, the character was encoded in "�"

With the help of Google I have found this post (2007) with a way to have an happy deserialization :-)

You can use XmlTextReader instead of XmlReader, but with more research I have found that you can still use XmlReader with XmlReaderSettings and CheckCharacters set to false.

Here is an example :

    public class MyObject
    {
        public string MyString { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {

            XmlSerializerFactory fact = new XmlSerializerFactory();
            XmlSerializer ser = fact.CreateSerializer(typeof(MyObject));

            MyObject obj0 = new MyObject();
            obj0.MyString = "Hello\0World";

            // Serialize the object
            StringWriter sw = new StringWriter();
            ser.Serialize(sw, obj0);
            string xml = sw.ToString();
            // We can check that in the xml a \0 is transformed in �
            Console.WriteLine(xml);

            // Classic use of XmlReader.Create
            StringReader sr1 = new StringReader(xml);
            XmlReader xr1 = XmlTextReader.Create(sr1); // xr1's type is XmlTextReaderImpl
            try
            {
                MyObject obj1 = (MyObject)ser.Deserialize(xr1);
                Console.WriteLine("XmlReader [CheckCharacters({0})] : Success : {1}", xr1.Settings.CheckCharacters, obj1.MyString);
                Console.WriteLine(obj1.MyString);
            }
            catch (Exception e)
            {
                Console.WriteLine("XmlReader [CheckCharacters({0})] : Error : {1}", xr1.Settings.CheckCharacters, e.InnerException);
            }

            // Using an XmlTextReader
            StringReader sr2 = new StringReader(xml);
            XmlTextReader xr2 = new XmlTextReader(sr2);
            // xr2.Settings is null
            MyObject obj2 = (MyObject)ser.Deserialize(xr2);
            Console.WriteLine("XmlTextReader : Success : {0}", obj2.MyString);

            // Using XmlReader with the good XmlReaderSettings
            StringReader sr3 = new StringReader(xml);
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.CheckCharacters = false; // default value is true;
            XmlReader xr3 = XmlTextReader.Create(sr3, settings); // xr3.Settings.CheckCharacters is a read only and xr3's type is XmlTextReaderImpl
            MyObject obj3 = (MyObject)ser.Deserialize(xr3);
            Console.WriteLine("XmlReader [CheckCharacters({0})] : Success : {1}", xr3.Settings.CheckCharacters, obj3.MyString);
        }
    }

Tagged as: , No Comments
16Jul/092

XmlWriter and UTF-8 encoding without signature

I used this code to serialize some objects in Xml :

XmlWriter writer = new XmlTextWriter(stream, Encoding.UTF8);

But the output contains an UTF header, the Byte Order Mark (BOM). The use of the header/signature is usually for xml file, if you want to use the ouput in an HttpResponse, you don't need the signature. (some parser can cause a parsing error in java, like org.xml.sax.SAXException).

Here is the change to remove the BOM :

XmlWriter writer = new XmlTextWriter(stream, new UTF8Encoding(false));

Tagged as: , , , 2 Comments