Odd CSV File

By | April 2, 2020

As a lecturer, I wanted to register my students into my Ubuntu machine. I wanted it done in a fancy way: scripting. Then I began to download the list of my students from our system, opened it with Microsoft Excel sheet format, and saved it as CSV format.

It was just another script to parse the CSV… but I guess it was not my day. The output was erratic. I thought my scripting skill was already rusty, since the script to split the fields, both using array and plain field separator, only gave me the unexpected result.

Then I just realized that the CSV file from Microsoft Excel was UTF-8 Unicode (with BOM) text, with CRLF line terminators. The command file helped me to realize this oddity. Using xxd, it showed me that the text file began with 3 bytes of non-printable characters. This is odd since CSV that I knew never had a file header.

Never had this kind of file before. BOM stands for Byte Order Mark to notify the reader (application, not you) the Endianness of the file. But, I found a StackOverflow answer that BOM in UTF-8 text file is irrelevant. I didn’t know why Microsoft Excel for Mac version 16.35 (20030802) gave me this format. But then I realized that I was the one who made that mistake. I should’ve chosen the regular CSV file format, not the “CSV UTF-8 (Comma delimited)”.

How to fix an odd CSV file format UTF-8 Unicode with BOM (Byte Order Mark), with CRLF line terminator? Well, quick fix for this problem was simply run dos2unix command as follow:

and the script was good to go.

Just another day passed with a simple mistake.

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.