Change file character encoding
November 3rd, 2008
I have a few files in EUC-JP that I want to edit - unfortunately for me, text editors these days, while having good unicode support, do not necessarily have support for other codepages.
A long time ago (computer-wise), iconv was created to handle just this situation. the only problem is that it’s only a library.
PHP has an Iconv interface. and so does Ruby.
Here is an example of using the latter; I will read in a file (hardcoded, in this case), convert the codepage, change the encoding specified (well, it could be more sophisticated), and write the result out to disk.
require 'iconv'
input = ""
File.open("C:/2879.html","r") { |file| input << file.read }
r = Iconv.new("UTF-8","euc-jp").iconv(input)
r.gsub!("EUC-JP", "UTF-8")
outfile = File.new("C:/2879.1.html","w")
outfile.print r
Note that I’m using one of those unusual windows paths that ruby allows. I could have also used “C:\\2879.html” instead; but I hate having to do escaping on paths.
Posted in Ruby |