Please, somebody, add Unicode header to the spreadsheets!

You are here

Haruo's picture

When I download the CSV file of My Hymnals, one of the tear-my-hair-out annoyances is that the default encoding is ANSI. I have hymnals in a wide variety of writing systems, and It is unspeakably annoying to find titles like 

Песенник : Песнь возрождения (Pesennik : Pesn' vozrozhdeniya) (Songbook: Song of Rebirth)
Русско-американский сборник: Гимны Христиан - Russian-American Hymnal: Christian Hymns
ကီရ်ပိှသီ [The Church Hymnal With Notes Sagaw Kayin]
こども さんびか [Kodomo Sanbika]
聖歌 [Seika]
聖詩 (shèng shī) Hymnody
讃美歌 (Sanbika)
讃美歌 (Sanbika)
讃美歌 / 讃美歌第二編 (Sanbika/Sanbika Dainihen)
讃美歌 21 (Sanbika 21)
시편찬송가 (The Book of Psalms for Singing)
찬송과 예배 = Chansong gwa yebae = Come, Let Us Worship: the Korean-English Presbyterian hymnal and service book
 

where one ought to see

Песенник : Песнь возрождения (Pesennik : Pesn' vozrozhdeniya) (Songbook: Song of Rebirth)
Русско-американский сборник: Гимны Христиан - Russian-American Hymnal: Christian Hymns
ကီရ်ပိှသီ [The Church Hymnal With Notes Sagaw Kayin]
こども さんびか [Kodomo Sanbika]
聖歌 [Seika]
聖詩 (shèng shī) Hymnody
讃美歌 (Sanbika)
讃美歌 (Sanbika)
讃美歌 / 讃美歌第二編 (Sanbika/Sanbika Dainihen)
讃美歌 21 (Sanbika 21)
시편찬송가 (The Book of Psalms for Singing)
찬송과 예배 = Chansong gwa yebae = Come, Let Us Worship: the Korean-English Presbyterian hymnal and service book
 

A whole lot of resetting is necessary to get it to show the right characters. Even Spanish, French and German look ridiculous. Where Spanish has ¡ the spreadsheet has Â¡ ... the Welsh hymnal "Mawl a chân" is miswritten "Mawl a chân" ... the Mazatec hymnal "Ki̱jndá‑lá Na̱'ín‑ná ‑ Cantemos al Señor (Segunda edición)"  is turned into "Ki̱jndá‑lá Na̱'ín‑ná ‑ Cantemos al Señor (Segunda edición)". The header of the file that one downloads should specify Unicode UTF-8. At the moment it presumably doesn't specify an encoding, since practically any other option would be better for other alphabets than the ANSI default.

Please!

The librarians I just discussed this with join me in my plea.

Leland Ross


Comments

Are you opening the CSV in Excel? If using Excel on Windows 1. Open Excel 2. Open a blank sheet 3. Go to the “Data” tab 4. Select “Get Data” > “From File” > “From Text/CSV" 5. Select the CSV file 6. Select “65001: Unicode (UTF-8)” for the file origin and “Comma” for the delimiter 7. Click “Load." If you are using Excel on a Mac  1. Open Excel 2. Open a blank sheet 3. Click “File” > “Import” 4. Select “CSV file”, then click “Import” 5. Select the CSV file 6. Select “Delimited” and select “Unicode (UTF-8)” for file origin, then click “Next >” 7. Select “Comma” as the delimiter, then click “Finish” 8. Select a location to import the data

Thanks, Dianne, but I'm tired of first of all having to remember all those steps and then go through them, especially that knowing (I am assured by people who know more than I do) that by including a simple short bit of code by default, similar to the HTML piece I illustrate below, those of us using alphabets other than Pure English with no façades or toupées, let alone Arabic or Kanji, could be put on an equal footing. UTF-8 is the almost perfect solution. I'm probably the most diversely scripted user you've got, but I'm sure I'm not the only one, and this issue will cause problems even for people who are just doing Spanish. I know the workaround, but I also know how time-consuming it is even for someone (like me) who has done it numerous times, but what about the user who's never run into Korean hangul that looks like "시편찬송가" before. If I hadn't had helpful librarians to walk me through it several times till it became still annoying but old hat, I would be like I used to be, going through my list line by line highlighting and replacing each failed foreign letter with the correct one, one at a time, and by the time my two hours on the library computer were up I might have a copy I could post publicly without getting weird comments.

When I create a webpage, the code immediately after the beginning of the head section tells the world's computers what encoding to use. It looks like this:

<html>
  <head>
    <meta charset="UTF-8">

Neither I nor any techier person I've run this by seriously believes it can be much harder to do the same by default for the Hymnary system of CSV creation. If it really is impossible, let me know and I will redirect my complaint to Microsoft or whoever came up with CSV's (I'm untechy enough that I don't know why we have spreadsheets that are CSV and others that are XML (which latter the Seattle Public Library every time warns me are dangerous and asks if I'm sure I want to open such a thing) instead of just Excel (xls/xlsx) formats. 

It's reminiscent in a way of the JP2's that were making my life purgatory for awhile there until you (mostly) fixed the problem. It still crops up from time to time, but it's not ubiquitous like it was for a while there.

Our CSVs are encoded with UTF-8. The problem is Excel understands ANSI by default. This is why you need to manually specify in Excel that the file is encoded with UTF-8. LibreOffice and Google Sheets default to understanding UTF-8 so they should open the CSVs correctly. 

The librarians misled me then. I'll let them know, and I'll complain to Microsoft. When I get my own computer hooked back up again, I'll have LibreOffice or the like and the problem will go away! Yay! But seriously, what kind of mental block would lead Microsoft to give Excel such parochial encoding?! (And I mean "parochial" in the less-than-laudatory sense, not "parish-owned [school]"!.