Catalogue language frequencies

One of our systems librarians wanted to create a language filter for the library catalogue and asked me for a list of codes, to which I replied with the MARC21 language code list, while recognising that this isn’t really very useful. So, I offered to compile a list of common codes, thinking that this would be a matter of common sense and wouldn’t be very long. However, reality and a need to take into account politics, together with various specialist collections and institutes with special language biases, made the list rather long. I sorted the list by numbers of records we have, which meant we could apply an objective cut-off. It’s still difficult, as some of our prestige collections, such as Hebrew, which I would have included in any list without thinking, don’t turn up as often as I would have thought. On the flip side, you can tell we recently merged with a specialist Eastern European studies institute by the second most common language on the list, which I reproduce below, although with the actual numbers of records omitted:

  1. English
  2. Russian
  3. German
  4. French
  5. Italian
  6. Polish
  7. Dutch
  8. Spanish
  9. Czech
  10. Hungarian
  11. Swedish
  12. Latin
  13. Norwegian
  14. Danish
  15. Finnish
  16. Hebrew
  17. Yiddish
  18. Bulgarian
  19. Croatian
  20. Icelandic
  21. Romanian
  22. Slovak
  23. Ukrainian
  24. Serbian
  25. Estonian
  26. Lithuanian
  27. Portuguese
  28. Latvian
  29. Greek, Ancient
  30. Belarusian
  31. Macedonian
  32. Slovenian
  33. Albanian
  34. Greek, Modern
  35. Welsh
  36. Afrikaans
  37. Turkish
  38. Catalan
  39. English, Middle
  40. Chinese
  41. Arabic
  42. English, Old
  43. Moldovan

However, I will say that English was about 10 times more common than Russian, with the frequencies declining gracefully thereafter. Taking the Eastern European languages out of the list, I am still surprised by German coming second rather than French. I suspect the Second World War has made us largely forget the importance of German as a cultural and academic language, e.g. in literature, archaeology, medicine, and philosophy (and probably Easter European studies).

The list is also quite badly skewed by errors and idiosyncracies in coding in the 008 field. E.g., English (eng) as the default in templates is often left there by mistake, the 041 is rarely entered fully, and one language I left off the list, Faroese, is represented in our catalogue by two codes, one of them wrong. Nevertheless, I think it is interesting.