09 November 2004
This post may be outdated due to it was written on 2004. The links may be broken. The code may be not working anymore. Leave comments if needed.
    问题:
  1. 将gb2312数据格式的文件转为utf8格式
  2. 在utf8编码下显示gb2312数据文件
  3. 如何直接生成utf8格式的数据文件
  4. 不涉及文件读取,script文件里print "中文";
其实最主要的一句为:
use Encode qw/encode decode/;
my $utf_data = encode("utf8", decode("gb2312", $data));
# $data为gb2312格式, $utf_data为utf8格式
等同的代码还有:
use Encode qw/from_to/;
from_to($data, "gb2312", "utf8");
# $data从gb2312格式转为utf8格式
相反从utf8转为gb2312也成。encode,decode里的参数互换下。

例子与代码:
gb.dat是gb2312数据格式的文件。在-charset=>'utf-8'时显示乱码,gb2312时正常。


#!/usr/bin/perl
use strict;
use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:standard/;
use Encode qw/encode decode from_to/;

my $cgi = new CGI;
# charset utf8
print $cgi->header(-type=>'text/html',-charset=>'utf-8');

# open the gb2312 file
open(FH, "gb.dat");
my $data = <FH>;
close(FH);

# convert gb2312 to utf8
my $utf_data = encode("utf8", decode("gb2312", $data));

# produce the utf8 file
open(FH, ">utf8.dat");
print FH $utf_data;
close(FH);

my $word = "我是中国人";
from_to($word, "gb2312", "utf8");

print "$utf_data, $word";
经过转换后,以后在-charset=>'utf-8'下直接读取utf8.dat而不用再次decode/encode.

不涉及文件读取,script文件里print "中文";

在script.pl里use encoding "euc-cn", STDOUT => "utf8";
use CGI qw/:standard/;
use encoding "euc-cn", STDOUT => "utf8";

my $cgi = new CGI;
print $cgi->header(-type=>'text/html',-charset=>'utf-8');

print "我是中国人,我爱野文";

Refer



blog comments powered by Disqus