Repair Text Files with mixed Line Breaks
Tutorial by Stefan Trost | Last update on 2022-04-12 | Created on 2022-03-15
Text files usually contain uniform characters for line breaks. Typically, these are the characters CR LF (#0D #0A) in text files created on Windows or the character LF (#0A) in text files stored on a Linux, macOS, or other Unix system.
However, it can be complicated if several different line break characters appear within one file. One reason for this may be, for example, that multiple text files coming from different systems were joined without paying attention to the used line breaks before.
In this tutorial, I would therefore like to show you how to deal with these files and show you a way to repair the files. We are using the program TextEncoder for this with which the line break type of text files can be changed.
- First, we open the affected text files in the TextEncoder. For this, the files in question can be easily dragged onto the program. Any number of files can be edited simultaneously.
- Then we activate the option "Line Breaks" on the right side under "Changes".
- Underneath we choose under "Read as" the option "Line break at each of these code points" from the drop-down box. In the text box below, we enter all code points of all line break types at which a line break should be realized. For example, if we have a text file in which the three line break types CR LF (#0D #0A), LS (#2028) and Tab (#09) occur, we can enter those line breaks comma separated as follows: "#0d#0a,#2028,#09 ". Under this list you will find further explanations regarding this option.
- In the drop-down box under "Save as" we select the uniform new line break type, which we want to use for our file. For example, the Windows line break CR LF.
- Now we can set in the "Storage Options" if we want to override our original files or want to save the converted files under a new name.
- Then we can click on "Convert and Save" to perform the conversion of all files in the list. All three specified types of line breaks are thus normalized and converted into the uniform line break type CR LF.
The code points in the field "Line break at each of these code points" can be defined in various ways. In the example above, we use the hexadecimal notation (for example #0d#0a). Equally, the decimal notation (13 10) is possible or the form U+000D U+000A. All of those writing types can be mixed arbitrarily.
Also, we are not forced to define the line break characters in the form of code points. In the example, we have only done so because we are working with non-visible characters. However, if we want to read a file with readable line break characters, we can also select the option "Line break at each of these characters" and define the characters directly. For example, "a,b" when the letters "a" and "b" are our line breaks or ",",";" when lines are limited by a comma or a semicolon.
Conversion via the Command Line
The previous explanations are describing the procedure for the conversion via the graphical user interface. With the batch version of the TextEncoder, a conversion of text files is also possible via the command line or via a script.
The example from above looks as follows when implementing it via a call from the command line and converting the file test.txt:
TextEncoder.exe -cl test.txt lb-read=customcps-#0D#0A,#2028,#09 lb=crlf
We are using the parameter lb-read with the value customcps-#0D#0A,#2028,#09 to control the reading of the file and the parameter lb=crlf to realize the storage with the line break type CR LF. Instead of customcps, we can also use the parameter customstrs in the same way when using legible characters as line break signs. For example, lb-read=customstrs-a,b for "a" and "b" as letters for line breaks.
An explanation of all parameters of the batch version can be found here.
Changement of the Line Break Type using the TextConverter
All functions introduced in this tutorial, including the command line functions, can also be applied with the TextConverter. With the TextEncoder used here, only the type of line break or the encoding of files can be changed. With the TextConverter you can furthermore also edit the content of texts and files.
About the Author
You can find Software by Stefan Trost on sttmedia.com. Do you need an individual software solution according to your needs? - sttmedia.com/contact
Rewrite Text Files with a fixed Line Length
Tutorial | 0 Comments
PHP: Check Strings with Ctype-Functions for Character Classes
Article | 0 Comments
PHP: Remove arbitrary Characters at the Beginning and the End of a String
Tutorial | 0 Comments
Lazarus: Detect Operating System (Compiler Switch)
Tutorial | 0 Comments
MySQL: Line breaks in MySQL
Tip | 0 Comments
Delphi: Multiline Caption for TLabel (at Run Time and Design Time)
Tutorial | 1 Comment
Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.
Ask your own question or write your own article on askingbox.com. That’s how it’s done.