PHP: Iterate UTF-8 String Character by Character
Question by Guest | 2014-03-13 at 20:28
I would like to iterate through a PHP string character by character. Up to now, I have developed the following function for this purpose:
$s = 'abc'; // is working $s = 'äüö'; // is not working for ($i = 0; $i < strlen($s); $i++) { $c = $s[$i]; }
Unfortunately, this way is only working for ASCII characters such as "abc". As soon as the string is containing some Unicode letters such as umlauts ("äöü"), it is no longer working. Apparently the function is not aware of multibyte characters (ä is consisting of 2 bytes in UTF-8 encoding).
What can I do to nevertheless iterate through the string character by character in a loop?
Related Topics
MySQL: Line Breaks in MySQL
Tip | 0 Comments
PHP Mail Function: UTF-8 E-Mail Headers
Info | 0 Comments
PHP: Sending an E-Mail
Tutorial | 0 Comments
PHP: Check Strings with Ctype-Functions for Character Classes
Article | 0 Comments
Android Programming: Receive Responce from HTTP POST Request
Tutorial | 0 Comments
Important Note
Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.
Participate
Ask your own question or write your own article on askingbox.com. That’s how it’s done.
Indeed, most of the normal PHP functions are not capable of multibyte strings.
But you can do it this way:
Using preg_split, you are splitting the string into a single characters and you store them in an array. After that, you can easily loop through the array to access your individual characters.
By the way, we are using the modifier "u" so that the string is treated as UTF-8.
2014-03-13 at 23:32
I think the most efficient way to process each character in a UTF-8 (or similarly encoded) string would be to work through the string using mb_substr. In each iteration of the processing loop, mb_substr would be called twice (to find the next character and the remaining string). It would pass only the remaining string to the next iteration. This way, the main overhead in each iteration would be finding the next character (done twice), which takes only one to five or so operations, depending on the byte length of the character.
If this description is not clear, let me know and I'll provide a working PHP function.
2016-02-29 at 23:56
Yes, of course. A working example would be very good to see what you mean. Thank you very much for it!
2016-03-01 at 00:31