The Computer Oracle

Why does VIM show the Unicode code point and not the UTF-8 code value?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping

--

Chapters
00:00 Why Does Vim Show The Unicode Code Point And Not The Utf-8 Code Value?
00:50 Accepted Answer Score 17
01:07 Answer 2 Score 13
02:26 Thank you

--

Full question
https://superuser.com/questions/786743/w...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#vim #encoding #unicode #utf8

#avk47



ACCEPTED ANSWER

Score 17


Why is the Unicode code point being displayed and not the UTF-8 code value?

Because you use ga:

<”> 8221, Hex 201d, Octal 20035

instead of g8:

e2 80 9d



ANSWER 2

Score 13


Because Vim is a text editor and works with text codepoints, not bytes. There is more than just one translation happening – when opening a file, the editor must decode it from the byte encoding to an internal representation (usually Unicode); when saving back to a file, or when displaying its contents on the terminal, the editor must encode the text back to bytes.

One reason for this is simple – the file and the terminal might be using different character sets. For example, you're editing some old documents in ISO 8859-13 or KOI8-R, and want them to show up correctly on a UTF-8 terminal.

The second reason, again, is that text editors work with text. For example, is one character and its width is one terminal cell, regardless of its byte encoding (3 bytes in UTF-8, 1 byte in Windows-1257, 2 bytes in Shift-JIS, and so on). If Vim merely counted it as three bytes but the terminal showed it as one, it would result in vertical splits being misaligned, lines wrapped too soon, tabs appearing too narrow, and so on.

Instead of this...                ...you would see this.

┌───────────────────────────┐     ┌───────────────────────────┐
│She said, "Hello."         │     │She said, "Hello."         │
│                           │     │                           │
│She said, “Hello.”         │     │She said, “Hello.”     │
│                           │     │                           │
│Ji pasakė, „Sveiki“.       │     │Ji pasakė, „Sveiki“. │
└───────────────────────────┘     └───────────────────────────┘

Not to mention, you'd have to Backspace three times to delete a single character.