1113 Votes

PHP: Permit only certain Letters, Numbers and Characters in a String

Tip by Stefan Trost | Last update on 2024-05-07 | Created on 2013-01-19

Today, I would like to show you a way, how you can check a string to see whether it only consists of certain letters, numbers or characters using PHP.

An area of application would be, for example, to check user input such as user names which may contain only specific characters or character groups.

Basics

First, we would like to have a look at the following PHP code:

if (preg_match("#^[a-zA-Z0-9]+$#", $text)) {
   echo 'String only contains numbers and letters.';
} else {
   echo 'String also contains other characters.';
}

With this code we want to check whether the string $text only consists of uppercase and lowercase letters (a to z as well as A to Z) or numbers (digits 0 to 9). For this, we use the function preg_match() with a regular expression. As a first parameter, we pass the regular expression, where the actual regular expression is written between "#" and "#".

The symbol ^ represents the beginning of the string, the $ stands for the end of the string. In between, all characters are allowed, which are defined with the character class between the square brackets. The characters from this class can occur as often as they want. This is indicated by the + sign behind the square brackets. Within the character class, we have written "a-z" for the lowercase letters, "A-Z" for the uppercase letters and "0-9" for the digits.

Check for different Characters and Character Classes

Of course, our check is not only limited to these character classes, because we can extend or change our regular expression as desired.

We would like to take a look at some examples for this as well as at some other modifications in the following sections of this tutorial:

Also check for other Characters like Accents and Umlauts

Naturally, accents and umlauts such as the German letters Ä, Ö or Ü are not included in the character class A-Z that we have used in our first example, while they can nevertheless occur in normal words depending on the language of the text. If we also want to allow these characters, we have to write the characters in question separately (here in both, their lowercase and their uppercase variant):

if (preg_match("#^[a-zA-Z0-9äöüÄÖÜ]+$#", $text)) {
   echo 'String only contains letters including umlauts and digits.';	
} else {
   echo 'String also contains other characters.';
}

So, this way, we can just add any other character to our character class in order to check also for these characters respectively in order to allow these characters in our checked string, too. We can do this for every other letter we want to check for.

However, this approach can become very cumbersome when we consider how many variants alone of Latin letters exist. Fortunately, it is not necessary to create such a list of all letter variants, as the next section shows. For this reason, we should use the approach shown here only if we really only want to allow a limited selection of letters, like the letters from the German language here.

Check for Letters in general

Up to now, we have defined character sets by writing each letter explicitly to our character class.

But how do you ideally test if you also want to allow each other letter possible such as è, ø, é or ă? Finally, it is not possible to write all conceivable letters into our regular expression. The following example shows a way of how we can handle this situation:

$text = "abcABCäöüÄÖÜßéëèâøçñúœЖЛЩΘΨ";

if (preg_match("#^\p{L}+$#u", $text)) {
   echo 'String contains arbitrary letters.';
}

L stands for an arbitrary letter, \p for a character that has the property defined in the curly brackets behind and u stands for Unicode. So, this regular expression checks for arbitrary, arbitrarily repeated characters with a Unicode code point from the category "letter". This means that this regular expression matches any letters such as Ë, Ç, Â, Ñ, Ú, Ö or Œ, but returns false as soon as, for example, a number or a punctuation mark occurs in the text to be checked. By the way, "letters" not only refer to the Latin letters mentioned here, but also to letters from other alphabets such as Cyrillic or Greek letters. The string from the example code set in the variable $text would therefore match.

Using L, we are allowing lowercase as well as uppercase written letters. If you only want to check for lowercase or for uppercase letters, you can use "Ll" (Letters lowercase) or "Lu" (Letters uppercase) instead of L. If we only want to allow Latin letters, we can write out "Latin", similarly we can for example also use "Arabic", "Braille", "Cyrillic", "Egyptian_Hieroglyphs", "Georgian", "Greek", "Han" (Chinese ), "Hebrew", "Hiragana", "Katakana" (both Japanese), "Thai" and some other alphabets.

This \p{x} extension was added in PHP version 5.1.0, so it cannot be used with older PHP versions where you have to use the normal character classes instead presented previously. When using this extension, we should also note that such a check for Unicode properties is not particularly fast due to the large number of Unicode characters.

Check for all Letters and Numbers

In the first code examples of this tutorial, in addition to the letters permitted by character classes or individual definitions, we also checked for numbers in our string respectively we allowed their occurrence.

If we also want to extend the example from the last section to include numbers in order to allow both, arbitrary letters as well as arbitrary numbers, we can modify the check as follows:

if (preg_match("#^[\p{L}0-9]+$#u", $text)) {
   echo 'String consists of arbitrary letters and/or the digits 0 to 9.';	
}

Using the square brackets we have defined a character class that consists with \p{L} of any letters and with 0-9 with the digits 0 to 9 that can be repeated arbitrarily.

This regular expression will, next to the letters, really only be true for the digits 0 to 9. However, as soon as our string contains other numerical characters such as the superscripted numbers ² or ³, the check fails. Similar to the letters before, we don't have to bother listing all possible number characters (unless, of course, we only want to allow a certain selection), if we also want to allow these characters. Instead, we can work with the \p{} notation again, this time using the letter "N" for all numeric characters:

if (preg_match("#^[\p{L}\p{N}]+$#u", $text)) {
   echo 'String consists of arbitrary letters and/or numerical characters.';	
}

As you can see, we have now formed a character class consisting of \p{L} (any letters) and \p{N} (any numerical characters).

Important: A notation like \p{LN} or similar is not possible. We have to write both separately in the form shown here.

Allow only Letters or only Numbers

In summary, we would now like to look at how we can use the character class definitions within regular expressions we have learned so far to not test a string for several groups of characters at the same time, but only to allow a certain group of characters, in order to determine which character class a string belongs to.

To do this, in the following, we will look at two example codes that we use to check whether a string consists exclusively of letters or exclusively of numbers. Let's start with the letters:

if (preg_match("#^[A-Z]+$#", $s)) {
   echo 'String consists only of capital letters from A to Z.';
} else if (preg_match("#^[a-z]+$#", $s)) {
   echo 'String consists only of lowercase letters from a to z.';
} else if (preg_match("#^[A-Za-z]+$#", $s)) {
   echo 'String consists only of upper and lower case letters from A to Z.';
} else if (preg_match("#^[\p{Latin}]+$#u", $s) && preg_match("#^[\p{Lu}]+$#u", $s)) {
   echo 'String consists only of uppercase Latin letters.';
} else if (preg_match("#^[\p{Latin}]+$#u", $s) && preg_match("#^[\p{Ll}]+$#u", $s)) {
   echo 'String consists only of lowercase Latin letters.';
} else if (preg_match("#^[\p{Latin}]+$#u", $s)) {
   echo 'String consists only of Latin letters.';
} else if (preg_match("#^[\p{Lu}]+$#u", $s)) {
   echo 'String consists only of uppercase letters.';
} else if (preg_match("#^[\p{Ll}]+$#u", $s)) {
   echo 'String consists only of lowercase letters.';
} else if (preg_match("#^[\p{L}]+$#u", $s)) {
   echo 'String consists only of letters.';
} else {
   echo 'String contains other or mixed characters.';
}

This code checks the string $s to see if it only consists of letters from a certain group. We start by checking whether $s contains only uppercase letters from A to Z. If this is not the case, we next check for the lowercase letters from a to z and then for uppercase and lowercase letters from A to Z (so at this point not only "ABC" or "abc" would match but also "Abc") . If this is not the case either, we continue to check for uppercase, lowercase or general Latin letters (that is in addition to the letters from A to Z also for other Latin letters such as accents or umlauts), then for general capital letters (in addition to A to Z, accents and umlauts, the string may now also contain letters from other alphabets, as long as they are capitalized), for general lower case letters and finally for general letters (that is both lower case and upper case letters with or without accents from any alphabet). If the string to be checked cannot be clearly assigned to any of these groups, we finally output a default message. We have combined the check for Latin lowercase and capital letters here from two individual tests (check for Latin letters + check for capital letters versus check for lower case letters), since we can only use either "Latin" or "Lu" / "Ll".

We can also program a similar check for numbers:

if (preg_match("#^[0-9]+$#", $s)) {
   echo 'String consists only of the digits from 0 to 9.';
} else if (preg_match("#^[\p{N}]+$#u", $s)) {
   echo 'String consists only of numeric characters.';
} else {
   echo 'String contains other or mixed characters.';
}

Here it is first checked whether $s consists exclusively of the digits 0 to 9. If this is not the case, a check is made for general numerical characters. At this level, our string is allowed to contain not only the digits from 0 to 9 but also other numerical characters such as ² or ½. If this is not the case either, we issue a default message. For example, this would be displayed if we were testing a string that contained any letters or punctuation marks.

Only allow certain individual Characters

So far in this tutorial we have mainly looked at character classes such as all possible uppercase letters, all lowercase letters, all possible digits or all Latin letters. However, we can also proceed in the same way if we only want to allow a specific selection of individual letters or characters in our string.

To do this, it is sufficient to simply write the letters that we want to allow individually into our regular expression, as the next example shows:

if (preg_match("#^[abx]+$#", $text)) {
   echo 'String consists only of the letters a, b and x.';
}

Here we exclusively test for the letters a, b and x. This means that strings like "aaa", "bax" or "bab" would be allowed, but as soon as a single other character appears within the string, this expression would be false.

Of course we can also mix letters and numbers as we wish. For example, with the following code we want to only allow lowercase vowels, the capitalized X as well as the numbers 2, 3, and 7:

if (preg_match("#^[aeiou237X]+$#", $text)) {
   echo 'String consists only of vowels, X and the digits 2, 3 and 7.';
}

If we want to allow multiple letters that appear one after the other in the alphabet, we can also define them in the form of a character class. So, we can not only use the character classes we have seen so far as a whole (such as in the form "A-Z") but also select partial areas, as the next two examples show:

if (preg_match("#^[C-H]+$#", $text)) {
   echo 'String consists only of the letters C, D, E, F, G and H.';
}

In this example we have selected the section C to H. Accordingly, this expression is only true for strings that consist exclusively of the letters C, D, E, F, G and H.

if (preg_match("#^[a-oq-z]+$#", $text)) {
   echo 'String consists only of the lowercase letters a to z except p.';
}

And in this example we want to allow all lowercase letters from a to z except the lowercase p. For this reason, to exclude p, we wrote both the character classes a to o as well as the character class q to z in our regular expression.

Allow Spaces

By the way, adding arbitrary characters to our regular expression applies not only to letters or numbers but also to spaces. In this example, we have added a space to a character class of letters and numbers:

if (preg_match("#^[a-zA-Z0-9 ]+$#", $text)) {
   echo 'String only contains letters, digits and spaces.';
} else {
   echo 'String also contains other characters.';
}

With this, strings that contain not only the specified letters and numbers are permitted but also those in which spaces occur.

Of course, instead, we can also add the space to the character class [\p{L}\p{N} ] introduced before to allow spaces as well as any letters and numerical characters instead of just spaces and the letters and digits a-z, A-Z and 0-9.

Points, Brackets and other Special Characters

Some characters have a special meaning within a regular expression. These include, for example, points or brackets. If we would like to include such characters in our character class, we have to write this in the following way:

if (preg_match("#^[a-zA-Z0-9äöüÄÖÜ \.\]]+$#", $text)) {
   echo 'String only contains the listed characters.';	
}

With \ we can escape the respective characters. That means: with \ we say, that the next character should be understood as a character and not as a regular instruction. In the example, we add a point and a square bracket to our character class with this. Interesting at the end: The last bracket is the end of our character class, while the bracket before has a \ infront of it and therefore becomes part of the character class.

But what if we also want to add the \ as a character to our class? It's simple: Also, the \ character can be escaped with writing a \ infront of it - we can simply write \\.

In total there are 15 different characters that have a special meaning within regular expressions and that we need to escape in this way for this reason. These metacharacters include [ ] ( ) { } | ? + - * ^ $ \ as well as the point. You can learn more about this topic in the regular expression basics tutorial.

In our examples so far, we have always responded positively to the existence of our chosen character class. But we can also reverse our search by negating our if-query with an exclamation mark. The next two examples illustrate the difference:

if (preg_match("#^[a-z]+$#", $text)) {
   echo 'String only contains lowercase letters.';
}

This code shows a non-reverse search for any lowercase letters from a to z. Our if-condition is fulfilled as soon as $text consists only of these lowercase letters and no other letters or characters.

Now let's turn the search around by putting the mentioned exclamation mark before the call of preg_match() while keeping our regular expression:

if (!preg_match("#^[a-z]+$#", $text)) {
   echo 'String does not consist of lowercase letters only.';
}

This time our if-condition is met as long as $text also contains characters other than the allowed lowercase letters. This means that $text can still contain lowercase letters, but as soon as one other character such as a digit or an uppercase letter appears in $text, our condition is non-fulfilled and we get the corresponding output.

Whether we should use a positive search or a negative search in practice obviously depends on the respective area of application and what we want to check.

CType-Functions

For certain, often used character classes such as Latin letters or digits, you can alternatively also easily use the CType-Functions of PHP. More on this topic you can find in my CType String Tutorial.

The advantage of the CType functions is that we do not need a regular expression and this specialization means that these functions have a better performance than the preg_match() function, used here.

ReplyPositiveNegativeDateVotes
00 Votes

Is there a way to check against all characters and letters?

Also those that occur in other languages, such as é, è, â, ...
2015-01-12 at 21:04

ReplyPositive Negative
22 Votes

Good suggestion.

I have added some code to check for arbitrary letters to the article.
2015-01-13 at 01:37

Positive Negative
Reply
Reply

About the Author

AvatarYou can find Software by Stefan Trost on sttmedia.com. Do you need an individual software solution according to your needs? - sttmedia.com/contact
Show Profile

 

Related Topics

Important Note

Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.

Participate

Ask your own question or write your own article on askingbox.com. That’s how it’s done.