Whaly's World tail -f /var/log/whaly

28Jan/101

How to remove Diacritics in .net C#

If you need to remove diacritics from a string you can :

  • [Bad] do a lot's of replace like :-(
inputString = inputString.Replace('À', 'A');

repeat for ÀÁÂÃÄâãäàáÈÉÊËêëèéÌÍÎÏîïìíÒÓÔÖôõöòóÙÚÛÜûüùúÝýÑñç and maybe miss some exotic chars or the Õ in this list !

  • [Good] or you can use this function :-)
	public static string RemoveDiacritics(string inputString)
	{
		//!\\ Warning 'œ' will be replaced with a 'o' not an 'oe'
		String normalizedString = inputString.Normalize(NormalizationForm.FormD);
		StringBuilder stringBuilder = new StringBuilder();
		for (int i = 0; i < normalizedString.Length; i++)
		{
			Char c = normalizedString[i];
			if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark)
				stringBuilder.Append(c);
		}
		return stringBuilder.ToString();
	}

Comments (1) Trackbacks (0)
  1. This is probably one of the coolest text parsing tricks I have ever seen. Thank you so much!


Leave a comment

(required)

No trackbacks yet.