Bernhard Häussner
Tags: Artikel mit dem Tag «Perl» durchstöbern

Fix subverison error: Valid UTF-8 data followed by invalid UTF-8 sequence

23.06.2010, 16:53

This is a solution to fix problems with SVN when you can't update your working copy for some rather odd reason. Everything you get is an obscure error message like this:

svn: Valid UTF-8 data
(hex: 65 64 69 74 65 64)
followed by invalid UTF-8 sequence
(hex: ad 6c 69 73)

This does not only appear when doing svn update but even pops up while svn status.

Since Subverion can handle binary files this is quite confusing. At luck, after some googling I found out that these errors are caused by file names with e.g. Chinese characters.

Unfortunately the error message can't display the corrupt file name because it contains non-UTF-8 data. So I figured that the „Valid UTF-8 data“ (In this case the hex sequence 0x65, 0x64, 0x69, 0x74, 0x65, 0x64) translates to the string „edited“ using some UTF-8 table.

Since there were way too many files with this string I had to look for the 0xAD 0x6C sequence. This could be the asian symbol 구, but you can't grep for this, because it is not UTF-8 encoded. However we can look for the byte sequence using some perl magic:

find /path/to/workingcopy | perl -n -e "print if /\xAD\x6C/" | less

Note the hex-regexp used here to scan binary content in file names. It outputs a nice (and in this case rather short) list like:


Interestingly, less tries to expose the binary data. Now you just have to rename the file and you're good to go and able to update your working copy again.

Kommentare: 1 Einträge
[ Seite 1 ]
© 2008-2018 by Bernhard Häussner - Impressum - Login
Kurz-Link zur Homepage: