00 Votes

PHP: strlen - Wrong result for Diacritics, Accents and Unicode Characters

Question by Guest | 2016-06-13 at 15:40

I am using the PHP function strlen() to determine and check the length of some user input. Unfortunately, this function is only working for strings, that are not containing any diacritics, accents, umlauts or other Unicode characters.

echo strlen("abc");  // result: 3
echo strlen("äbc");  // result: 4

In my script, I need the exact number of characters and I would like to have the result "3" for both, "abc" as well as "äbc".

Is this a PHP bug or what can I do to solve this?

ReplyPositiveNegative
0Best Answer0 Votes

A single character can be encoded in more than one byte depending on the used character encoding. A typical example is the UTF-8 encoding that is used for most websites. In this encoding, characters like a-z encoded with one byte, characters like ä, ü or ö with multiple bytes.

The function strlen() is only counting the number of bytes, it is not taking into account the meaning or encoding of the bytes. Therefore, you should better use the function mb_strlen() instead of strlen():

echo mb_strlen("abc", "utf-8");  // result: 3
echo mb_strlen("äbc", "utf-8");  // result: 3

Apart from strlen(), the multibyte function mb_strlen() is considering the coding and the resulting character length.

As a first parameter, you can pass the string you want to check, the encoding can be passed as the second parameter. If your website is UTF-8 encoded like most websites, you have to specify "utf-8" at this point.
2016-06-13 at 23:16

ReplyPositive Negative
Reply

Related Topics

PHP: Sending an E-Mail

Tutorial | 0 Comments

Important Note

Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.

Participate

Ask your own question or write your own article on askingbox.com. That’s how it’s done.