11 Vote

HTML: Eliminate problems with special characters and character encodings

Tip by Stefan Trost | Last update on 2022-12-27 | Created on 2012-01-18

Every Internet surfer has certainly seen it before: There are strange black boxes or meaningless hieroglyphics in the texts, where actually the accents should appear! Not only that the site is difficult to read, it is also extremely unprofessional to provide such sites online.

In this little tip, I would give you two opportunities at hand to get the problem under control.

The old solution

In former times, the problem usually has been solved with so-called "named entities". Named entities are nothing more than a different notation for all special characters, umlauts and other unusual characters. An overview of the most important named entities in German provides the following table as an example:

CharacterHTML CharacterHTML
ÄÄ ää
ÖÖ öö
ÜÜ üü
ßß ""
<&lt; >&gt;

So, instead of writing "ä", you simply write "&auml" into the HTML code and the data is displayed correctly. Problem with this old solution: who really would like to write these complicated abbreviations into the code - unless you have an editor that does this job for you.

The new solution

Luckily, there are now better ways to accomplish the same goal. The solution is called: UTF-8. UTF-8 is an encoding that makes it possible to provide codes for all characters, regardless of the character set they come from. With the old encodings such as Latin1, it was unfortunate, that in addition to the normal standard Latin letters like A to Z, not all special characters were coded or it was depending on which character set was used. Thus, for example, the umlauts were sometimes dropped under the table.

It is different with UTF-8, where all characters find an equivalent. To use UTF-8, you just have to save your PHP or HTML file in UTF-8 format. This is done by simply selecting the encoding when saving in the editor, or you can use a program like the Text Converter to automatically convert multiple files from any other encoding to UTF-8.

Afterwards, you must still tell the browser that the HTML file is UTF-8 encoded. Otherwise, the browser does not know how to interpret the data. This can be achieved through the following meta tag in the head of the HTML page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Alternatively you can also send a header via PHP:

header('Content-Type: text/html; charset=utf-8');

This PHP code must be executed before any other information of the page is submitted. With the fact that this information is written in the header of the request, the browser knows even earlier, how the page is to be interpreted.

ReplyPositiveNegative

About the Author

AvatarYou can find Software by Stefan Trost on sttmedia.com. Do you need an individual software solution according to your needs? - sttmedia.com/contact
Show Profile

 

Related Topics

Important Note

Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.

Participate

Ask your own question or write your own article on askingbox.com. That’s how it’s done.