enabling UTF-8 support breaks existing code

#1 Mobile Hacker For Hire > Blog > Uncategorized > enabling UTF-8 support breaks existing code

From: “Stefan Kanthak” <stefan.kanthak () nexgo de>
Date: Fri, 10 Feb 2023 14:22:42 +0100

Hi @ll,

almost 4 years ago, with Windows 10 1903, after more than a year
beta-testing in insider previews, Microsoft finally released UTF-8
support for the -A interfaces of the Windows API.

0) <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page#activeCodePage>

   | If the ANSI code page is configured for UTF-8, -A APIs typically
   | operate in UTF-8. This model has the benefit of supporting
   | existing code built with -A APIs without any code changes.

   The last claim is but a bloody DANGEROUS lie!
   As shown hereafter, it must read instead:
   "This model has the malefit of causing buffer overruns in existing
    code!"


1) For 30 years, the documentation of the -A interfaces for file and
   directory management of the Win32 API states:
   "The maximum path name length is 260 characters."

   See <https://msdn.microsoft.com/en-us/library/aa363855.aspx>
   "CreateDirectoryA function" for example:

   | For the ANSI version of this function, there is a default string
   | size limit for paths of 248 characters (MAX_PATH - enough room
   | for a 8.3 filename). ...
   ...
   | The 255 character limit per path segment still applies.

   This constitutes a contractual GUARANTEE for the product behaviour!


2) The documentation for the file systems supported by Windows says
   too "The maximum length of a file name segment is 255 characters."
   See <https://msdn.microsoft.com/en-us/library/ee681827.aspx>
   "File System Functionality Comparison"


3) With these 2 contractually GUARANTEED preconditions, the following
   code is safe, i.e. not susceptible to a buffer overrun:
   CreateDirectoryA() fails as soon as szPath exceeds the documented
   limit which is less than the buffer size of 260 characters.

   CHAR szANSI[] = "€"; // or one of the following other 122 characters
                        // from ANSI code page 1252:
                        // ‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂ
                        // ÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
   CHAR szPath[MAX_PATH] = "";
   do
      strcat(szPath, szANSI);
   while (CreateDirectoryA(szPath));


4) With UTF-8 support enabled, the same code now suffers from a
   buffer overrun:

   CHAR szUTF8[] = u"€"; // or "\xE2\x82\xAC"
   CHAR szPath[MAX_PATH] = "";
   do
      strcat(szPath, szUTF8);
   while (CreateDirectoryA(szPath), NULL);

   STRIKE 1!


5) Given the 2 guarantees from 1) and 2), the following code is
   also safe and not susceptible to a buffer overrun: see
   <https://msdn.microsoft.com/en-us/library/aa365740.aspx>
   "WIN32_FIND_DATAA structure" and "FindFirstFile function"
   <https://msdn.microsoft.com/en-us/library/aa364418.aspx>

   wfd.cFileName can NEVER receive a file/directory name longer
   than 255 characters, so the concatenation of C: or .\ (as
   well as C:\ and ..\ too) and wfd.cFileName NEVER overruns a
   buffer of MAX_PATH!

   #define PATTERN "C:*" // or "C:\\*" or ".\\*" or "..\\*"

   WIN32_FIND_DATAA wfd;
   CHAR szPath[MAX_PATH] = PATTERN;
   HANDLE hFind = FindFirstFileA(szPath, &wfd);
   if (hFind != INVALID_HANDLE_VALUE) {
      do {
         strcat(szPath + strlen(PATTERN) - 1, wfd.cFileName);
         GetFileAttributesA(szPath); // do something with the
         ...                         // found file system objects
      } while (FindNextFile(hFind, &wfd));
      FindClose(hFind);
   }


6) With UTF-8 support enabled and a file or directory named
   €€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
   (or other 86 characters from the above 123) present in the
   CWD of drive C:, FindFirstFileA() (and FindNextFileA() too)
   return a string of 86 * 3 = 258 characters in wfd.cFileName
   which causes a buffer overrun in previously safe code!

   STRIKE 2!


7) The following code enumerates ALL file system objects in a
   (root) directory or network share:

   WIN32_FIND_DATAA wfd;
   HANDLE hFind = FindFirstFileA("\\\\host\\share\\*", &wfd);
   if (hFind != INVALID_HANDLE_VALUE) {
      do { // code to process wfd.cFileName omitted
      } while (FindNextFile(hFind, &wfd));
      FindClose(hFind);
   }


8) With UTF-8 support enabled and a directory or file named
   €€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
   (i.e. 87 or more of the 123 characters from above) present,
   both FindFirstFile() and FindNextFile() FAIL with the
   previously impossible, NEVER encountered Win32 error code
   234 alias ERROR_MORE_DATA: wfd.cFileName is to short for
   UTF-8 encoded file/directory segment names!

   STRIKE 3!


stay tuned, and far away from UTF-8 in Windows
Stefan Kanthak

PS: for the full story of Microsoft's EPIC failures with UTF-8
    in Windows, see
    <https://skanthak.homepage.t-online.de/quirks.html#quirk33>
    <https://skanthak.homepage.t-online.de/quirks.html#quirk32>
    <https://skanthak.homepage.t-online.de/quirks.html#quirk31>

_______________________________________________
Sent through the Full Disclosure mailing list
https://nmap.org/mailman/listinfo/fulldisclosure
Web Archives & RSS: https://seclists.org/fulldisclosure/

Current thread:

Defense in depth — the Microsoft way (part 81): enabling UTF-8 support breaks existing code Stefan Kanthak (Feb 14)

enabling UTF-8 support breaks existing code

Table of Contents

Current thread:

Leave a Reply Cancel reply

Useful Links

Subscribe Now

enabling UTF-8 support breaks existing code

Table of Contents

Current thread:

Leave a Reply Cancel reply

Related Post

Useful Links

Subscribe Now