PHP: strlen - Wrong result for Diacritics, Accents and Unicode Characters
Question by Guest | 2016-06-13 at 15:40
I am using the PHP function strlen() to determine and check the length of some user input. Unfortunately, this function is only working for strings, that are not containing any diacritics, accents, umlauts or other Unicode characters.
echo strlen("abc"); // result: 3 echo strlen("äbc"); // result: 4
In my script, I need the exact number of characters and I would like to have the result "3" for both, "abc" as well as "äbc".
Is this a PHP bug or what can I do to solve this?
Related Topics
PHP: Remove arbitrary Characters at the Beginning and the End of a String
Tutorial | 0 Comments
PHP: Check Strings with Ctype-Functions for Character Classes
Article | 0 Comments
PHP Mail Function: UTF-8 E-Mail Headers
Info | 0 Comments
Textarea Maxlength: Limit Maximum Number of Characters in Textarea
Tutorial | 3 Comments
PHP: Sending an E-Mail
Tutorial | 0 Comments
Important Note
Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.
Participate
Ask your own question or write your own article on askingbox.com. That’s how it’s done.
A single character can be encoded in more than one byte depending on the used character encoding. A typical example is the UTF-8 encoding that is used for most websites. In this encoding, characters like a-z encoded with one byte, characters like ä, ü or ö with multiple bytes.
The function strlen() is only counting the number of bytes, it is not taking into account the meaning or encoding of the bytes. Therefore, you should better use the function mb_strlen() instead of strlen():
Apart from strlen(), the multibyte function mb_strlen() is considering the coding and the resulting character length.
As a first parameter, you can pass the string you want to check, the encoding can be passed as the second parameter. If your website is UTF-8 encoded like most websites, you have to specify "utf-8" at this point.
2016-06-13 at 23:16