MySQL: Line Breaks in MySQL
Tip by Stefan Trost | Last update on 2024-06-06 | Created on 2012-07-22
In some programming languages such as PHP or Java, we can use the characters respectively the escape sequences \r and \n for a line break. In JavaScript, however, the escape sequence \n alone represents a complete line break while other programming languages use character codes or provide corresponding functions or constants for inserting line breaks.
But what about MySQL? How can we work explicitly with line breaks using SQL commands, for example to search specifically for text with a line break in a database, to replace text across multiple lines, to write text with line breaks into a table, or to delete or replaceline breaks with another character?
In this article we would like to answer this question with the next sections:
- Basics: Escape Sequences and the CHAR() Function
- Windows Line Breaks
- Unix Line Breaks (Linux and macOS)
- Line Breaks within a String
- Usage of the CONCAT() Function
- Alternative Notation for the Character Codes
- Limitations of the CHAR() Function
- Unicode Line Breaks (NEL, LS, PS, VT and FF)
- Line Breaks in PHP and phpMyAdmin
Basics: Escape Sequences and the CHAR() Function
MySQL basically offers us two ways to specify line breaks: On the one hand, we can use the escape sequences \r and \n known from PHP, and on the other hand, MySQL provides us with the CHAR() function, which can be used with one or more ASCII codes and then returns the corresponding characters. The code respectively code point from the ASCII range for a line feed is 10 (LF), the one for a carriage return is 13 (CR). The corresponding escape sequences are \r for CR as well as \n for LF.
Windows Line Breaks
Accordingly, we can, for example, formulate the following MySQL query for Windows line breaks (CR LF):
UPDATE tab SET col = REPLACE(col, CHAR(13, 10), '');
With this, we replace all line breaks of the type CR LF with nothing - that is, we completely delete all line breaks from the text of the column "col" in the table "tab".
Instead of using the CHAR() function, we can alternatively also work with the escape sequences and insert the string '\r\n' into our query instead of CHAR(13, 10), which means the same thing and leads to the same result:
UPDATE tab SET col = REPLACE(col, '\r\n', '');
By the way, at this point, it doesn't matter whether we use single or double quotes. PHP does make a distinction between '\r\n' and "\r\n" and only replaces the second variant enclosed in double quotes with real line breaks while the first variant remains as such, but MySQL does not differentiate between the type of quotes and replaces both variants with real line breaks (more on this topic in the last section on line breaks in MySQL and PHP).
Unix Line Breaks (Linux and macOS)
Analogously, other types of line breaks can also be processed in the same way by using the corresponding other character codes or escape sequences instead of 13 and 10 respectively \r and \n. For example, line breaks of the Unix type, which are common on Linux or macOS systems, that consist of just the one character LF with the decimal code 10:
UPDATE tab SET col = REPLACE(col, CHAR(10), ' ');
With this UPDATE query we replace all occurrences of this line break type in the column "col" of the table "tab" with a space.
Again, we can alternatively work with escape sequences and use the escape character for the Unix line break \n for LF:
UPDATE tab SET col = REPLACE(col, '\n', ' ');
The result of both of these queries is identical.
Line Breaks within a String
So far, we have used the line breaks only alone and not in conjunction with any other characters within longer strings. Therefore, in the next example, we would like to look at how we can search for a text consisting of several characters including a line break, for example in a column of the type VARCHAR or TEXT with MySQL.
The easiest way to do this is with the help of escape sequences:
SELECT id FROM tab WHERE col = 'Line 1\r\nLine 2';
This query searches for all entries from the table "tab" that contains the two-line text with the first line "Line 1" and the second line "Line 2" in the column "col". For the example, we used a Windows line break. For a Unix line break, we would have to omit the "\r" and just write "Line 1\nLine 2" instead.
Of course, we can alternatively also use the CHAR() function again to implement this query. However, in this case we have to think about how we can connect the result of the CHAR() function with our other string parts, because there are some differences between the various database management systems and configurations.
In many database management systems such as SQLite, Firebird, PostgreSQL or Oracle, the pipe symbol (which is used as OR in MySQL) can be used to connect strings in the following way:
SELECT id FROM tab WHERE col = 'Line 1' || CHAR(13) || CHAR(10) || 'Line 2';
Apart from that, Microsoft SQL Server supports the plus sign to connect multiple strings instead:
SELECT id FROM tab WHERE col = 'Line 1' + CHAR(13) + CHAR(10) + 'Line 2';
However, in MySQL and also in MariaDB, which is often used as an alternative, neither of the two variants works in the presetting. By default, neither MySQL nor MariaDB support + nor || as operators for string concatenations and would return the numeric value 0 for the last example because the plus is considered as an arithmetic operator (though, the || operator can be activated also in MySQL and MariaDB while using the SQL mode PIPES_AS_CONCAT, which is deactivated by default, for example with using the statement "SET sql_mode = (SELECT CONCAT(@@sql_mode, ',PIPES_AS_CONCAT'));").
Without changing the default settings, MySQL and MariaDB use the function CONCAT() for string concatenations, which we will look at in the next section. In addition to MySQL and MariaDB, also Microsoft SQL Server, Oracle and PostgreSQL support this function for concatenating strings.
Usage of the CONCAT() Function
As an alternative to using the double pipe character || or the plus sign + for the string composition, as we saw in the examples from the last section, we can also use the CONCAT() function for the same purpose in MySQL, MariaDB, Microsoft SQL Server, Oracle and PostgreSQL, to which we can pass any number of parameters in the form of single strings. Among others, these single strings can consist of both, text enclosed in quotes as well as single characters generated by CHAR().
The example from the last section can be implemented in the following way using CONCAT():
SELECT id FROM tab WHERE col = CONCAT('Line 1', CHAR(13), CHAR(10), 'Line 2');
As we can see, we can simply separate the individual parts with a comma, which of course also applies to our escape sequences, as our next example shows:
SELECT id FROM tab WHERE col = CONCAT('Line 1', '\r', '\n', 'Line 2');
In contrast to PHP, it is again irrelevant whether we put our escape sequences in single or double quotation marks. SQL accepts both variants, while in PHP only double quotation marks work, which we will discuss in more detail in the last section.
Alternative Notation for the Character Codes
In all previous examples so far, we have always used the CHAR() function with decimal character codes for our line break characters. That is, 13 for CR (Carriage Return) as well as 10 for LF (Line Feed). However, MySQL also allows us to specify these code points in hexadecimal notation. The hexadecimal notation for 13 is 0D, while the hexadecimal notation for 10 is 0A. To signal to MySQL that our specification is a hexadecimal number, we need to prefix it with 0x. Let's look at an example for this:
UPDATE tab SET col = CONCAT('a', CHAR(0x0D), CHAR(0x0A), 'b');
With this SQL command we set the content of the column "col" to a two-line text consisting of the lines "a" and "b". For the Windows line break between the two lines we use the hexadecimal notation CHAR(0x0D) and CHAR(0x0A) instead of CHAR(13) and CHAR(10).
The specification of the Windows line break can be further simplified by contracting the individual characters 0D and 0A to 0x0D0A in the following way and thus only having to call the CHAR() function once:
UPDATE tab SET col = CONCAT('a', CHAR(0x0D0A), 'b');
When using the hexadecimal notation, we can even omit the CHAR() completely, as the next example shows:
UPDATE tab SET col = CONCAT('a', 0x0D0A, 'b');
And depending on the database system (SQLite/Oracle/PostgreSQL/Firebird) or even in the PIPES_AS_CONCAT mode of MySQL and MariaDB, it can also be done without CONCAT():
UPDATE tab SET col = 'a' || 0x0D0A || 'b';
Each of these four different example calls gives us exactly the same result.
With using the decimal notation, the notations shown here would not be possible. Apart from the fact that we cannot write "1310" because, unlike with the hexadecimal notation, it would be unclear that the decimal number "1310" refers to two individual numbers, a call to 'a' + 10 + 13 + 'b' could lead to the two numbers being interpreted as a string or a row of numbers and the result would be (depending on the database management system) the text "a1013b" or a numeric value instead of a line break.
Limitations of the CHAR() Function
Important: We cannot simply call the CHAR() function used here with arbitrary code points to convert arbitrary code points into their character equivalents, but must keep in mind the encoding used and the possibility of resulting multi-byte characters.
It is relatively unproblematic to use code points from the ASCII range with its integer numbers from 0 to 127, which also includes our codes for the Windows line break (13 and 10) as well as the Unix line break (10). For example, you can use CHAR(65) to create the letter "A" (code point U+0041, decimal 65):
SELECT CHAR(65); -- A
However, it becomes more difficult with characters that are represented by multiple bytes in the encoding we are using. For example, if our database uses the standard collation UTF-8, we cannot simply write CHAR(196) to create the letter "Ä". The letter "Ä" has the Unicode code point U+00C4, which corresponds to the decimal number 196, but CHAR() produces a byte sequence that is C3 84 for "Ä" instead of C4 in the UTF-8 encoding. For this reason, we have to pass exactly this byte sequence for the letter "Ä" when using a UTF-8 standard collation of our database (either directly in hexadecimal form via 0xC384 or in decimal form as 50052):
SELECT CHAR(196); -- 0xc4 SELECT CHAR(50052); -- Ä SELECT CHAR(0xC384); -- Ä
If we want to generate characters using code points or byte sequences that deviate from the standard collation of our database, we can also define our preferred character set using the USING extension:
SELECT CHAR(196 USING LATIN1); -- Ä SELECT CHAR(0xC384 USING UTF8); -- Ä SELECT CHAR(0x00C4 USING UTF16); -- Ä SELECT CHAR(0xD0A8 USING UTF8); -- Ш
As the first line shows, with this, it is possible to create the letter "Ä" using the decimal number 196 using the Latin-1 character set, since the letter "Ä" in Latin-1 is only encoded using the one byte C4 (196). Similarly, we can pass the byte sequence C3 84 for "Ä" in the UTF-8 encoding or the byte sequence 00 C4 for "Ä" in the UTF-16 encoding as well as any other byte sequences to generate the corresponding characters.
This basis is important for generating line breaks whose code points are outside the ASCII range, such as the Unicode line break types that we will look at in the next section using the same approach.
Unicode Line Breaks (NEL, LS, PS, VT and FF)
The Unicode standard requires that the characters NEL (Next Line, U+0085), LS (Line Separator, U+2028), PS (Paragraph Separator, U+2029), VT (Vertical Tab, U+000B) as well as FF (Form Feed, U+000C) are also to be interpreted as line breaks. For this reason, we would like to look at how we can use these line break types in our MySQL statements. As already explained in the last section, for this, we can use the CHAR() function again.
The following table shows for all characters with which UTF-8 and UTF-16 byte sequences we can call CHAR() to create the character in question:
Character | Code Point | UTF-8 | UTF-16 | |
VT | Vertical Tab | U+000B | CHAR(0x0B USING UTF8) | CHAR(0x000B USING UTF16) |
FF | Form Feed | U+000C | CHAR(0x0C USING UTF8) | CHAR(0x000C USING UTF16) |
NEL | Next Line | U+0085 | CHAR(0xC285 USING UTF8) | CHAR(0x0085 USING UTF16) |
LS | Line Separator | U+2028 | CHAR(0xE280A8 USING UTF8) | CHAR(0x2028 USING UTF16) |
PS | Paragraph Separator | U+2029 | CHAR(0xE280A9 USING UTF8) | CHAR(0x2029 USING UTF16) |
In practice, we can simply use these characters in our SQL statements, as we have already shown it for the other line break types. For example, to replace all Windows line breaks (specified here using escape sequences) with the NEL character:
UPDATE tab SET col = REPLACE(col, '\r\n', CHAR(0xC285 USING UTF8));
Or to find all records that contain the line separator character LS:
SELECT * FROM tab WHERE col LIKE CONCAT('%', CHAR(0x2028 USING UTF16), '%');
As the examples show, we can use the Unicode line break characters generated with CHAR() individually as usual or combined using the CONCAT() function. The escape sequences for Unicode code points known from many other programming languages, such as \x2028, \u2028 or \U{2028} for the line separator, are not supported by MySQL.
Line Breaks in PHP and phpMyAdmin
Generally, we can use the SQL commands shown here in the same way also within our PHP code or in phpMyAdmin. However, we have to consider that \r and \n are not always converted to the characters U+000A respectively U+000D, but can also remain as such under certain conditions.
The decisive factors here are on the one hand, the type of quotation marks used and on the other hand, the use of the MySQL-typical escape functions mysqli_real_escape_string respectively mysqli::real_escape_string. In the following, we would like to analyze some examples that demonstrate the different behavior of the various combinations of quotation marks and escapes.
First, we would like to have a look at this simple example with which we want to set the content of a column to the text from the variable $txt. $txt consists of two lines that we have separated with the Windows line break \r\n.
$txt = "Line 1\r\nLine 2"; $sql = "UPDATE tab SET col = '$txt' WHERE id = 1";
We defined the text for the variable $txt using double quotation marks. This is crucial at this point, as PHP makes a distinction between double and single quotation marks. Escape sequences such as our line break \r\n are only resolved into real line breaks if they are put in double quotation marks. If they are put in single quotation marks, however, the characters remain as such.
For this reason, in this example, PHP converts \r\n into a real line break even before our string reaches the database. The result is a line break in the database and even if we output $txt with echo or via vardump() a line break is created in the source code of our HTML page (only in the source code, not in the browser display because line breaks and other whitespace are ignored there).
Now we would like to change our example by just one detail: Instead of double quotation marks, we use single ones in our next example:
$txt = 'Line 1\r\nLine 2'; $sql = "UPDATE tab SET col = '$txt' WHERE id = 1";
This leads to the same result in the database but a different result in the source code. While this code again creates a real line break in our database, the string \r\n remains in the source text as such when $txt is output. How can that be? The single quotation marks ensure that PHP does not interpret or convert the escape sequences. This means that \r and \n remain as such in the variable $txt until the end and the output is made accordingly. However, this time the conversion does not take place at the PHP level but at the MySQL level. MySQL only converts \r\n into a line break after receiving the query and writes this into the database.
For security reasons and to avoid interference with some characters like quotes in a text, it is necessary to escape text before we release it to our database. This is usually done in procedural style with the function mysqli_real_escape_string() respectively with mysqli::real_escape_string if we use the object-oriented style.
Let's see how the result of our code changes when we escape our string with mysqli_real_escape_string() before inserting it into the database:
$txt = "Line 1\r\nLine 2"; $txt = mysqli_real_escape_string($db, $txt); $sql = "UPDATE tab SET col = '$txt' WHERE id = 1"; $res = mysqli_query($db, $sql);
This code also results in a real line break in both the database and the source code. The reason for this is that we use double quotation marks in the first line when setting the variable $txt. So, our characters \r and \n have therefore already been converted into real line breaks before escaping, which our escape function can no longer change.
However, things behave different if we call the same code with single quotes for the variable definition:
$txt = 'Line 1\r\nLine 2'; $txt = mysqli_real_escape_string($db, $txt); $sql = "UPDATE tab SET col = '$txt' WHERE id = 1"; $res = mysqli_query($db, $sql);
This time, PHP does not convert \r and \n into line breaks due to the single quotation marks. The output in the source code retains \r\n as we saw in our second example. However, this time MySQL does not convert it either, so our original string is written to the database with \r\n. This is ensured by the escape function in this code, which has converted (escaped) our \r\n to \\r\\n before the UPDATE command, which means MySQL does not convert it into line breaks and writes the re-escaped \r\n into the database.
Up to now, we have written all our queries with double quotation marks, which caused PHP to insert our variable $txt into the query in the form of its contents. For MySQL, such a query looked like this after inserting $txt (here with the lines a and b):
$sql = "UPDATE tab SET col = 'a\r\nb' WHERE id = 1";
However, if we write the entire query in single quotation marks, we have to escape the quotation marks that enclose our text ourselves so that it continues to work:
$sql = 'UPDATE tab SET col = \'a\r\nb\' WHERE id = 1';
This query also creates a line break in the database, but you cannot see it in the source code (that would be 'a\r\nb'). We get the same result on both levels with a double escape:
$sql = 'UPDATE tab SET col = \'a\\r\\nb\' WHERE id = 1';
Only with a triple escape, as we see in the next query, we get the text 'a\r\nb' in the database without a real line break, while in the source code the text 'a\\r\\nb' (also without a line break) is output:
$sql = 'UPDATE tab SET col = \'a\\\r\\\nb\' WHERE id = 1';
In addition to these special cases, it should finally be mentioned that we can also use line breaks as such in our PHP code as well as in the SQL prompt of phpMyAdmin:
$sql = "UPDATE tab SET col = 'Line 1 Line 2' WHERE id = 1";
This query creates a line break in both the source code as well as in the database, regardless of an escape or the CHAR() function.
About the Author
You can find Software by Stefan Trost on sttmedia.com. Do you need an individual software solution according to your needs? - sttmedia.com/contact
Show Profile
Related Topics
Repair Text Files with mixed Line Breaks
Tutorial | 0 Comments
Rewrite Text Files with a fixed Line Length
Tutorial | 0 Comments
PHP: Check Strings with Ctype-Functions for Character Classes
Article | 0 Comments
PHP: Fill Array with Sequence of Numbers or Characters
Tutorial | 0 Comments
How to Replace multiple Texts at the same Time
Tutorial | 0 Comments
PHP: Remove arbitrary Characters at the Beginning and the End of a String
Tutorial | 0 Comments
Important Note
Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.
Participate
Ask your own question or write your own article on askingbox.com. That’s how it’s done.