PerlTidy

30 October 2004


Perltidy is a tool to indent and reformat perl scripts. It can also write scripts in html format.
Homepage:http://perltidy.sourceforge.net

Install

download PerlTidy-20031021,
perl Makefile.PL
(n)make test
(n)make install
如果(n)make不成功的话,可以如下:
  • 将lib目录中的Perl/Tidy.pm copy to $Perl/site/lib, 将bin目录下的perltidy copy to $Perl/lib. run
    $Perl\bin>perl perltidy testfile.pl
    
  • 查看 pm2pl 文件,发现此程序将Tidy.pm与perltidy合并为一个新的 perltidy - Perl 文件
    >perl pm2pl testfile.pl
    
down来的包里有个目录为examples, 里面有些Script是值得一看的。

Usage

perltidy -html testfile.pl
此命令可生成已着色的HTML文件。
虽然颜色可能不是很让人满意,但可以改其中的CSS style或通过增加-css=mystyle.css选项来添加自己的css.
还有很多不错的选项如-nnn, 查阅perldoc Perl::Tidy获得更多帮助。

Parrot_in_a_Nutshell(中文翻译版)

30 October 2004


2004/8/17 20% unfinished.
2004/8/18 5% unfinished, with bloves.

Parrot in a Nutshell

By Dan Sugalski, [email protected]

英文/Engligh, 英文PDF下载

什么是 Parrot

  • perl 6 的解释器
  • 一个多语言的虚拟机(VM)
  • 一个不能控制(快成真)的愚人节玩笑

Parrot 的官方说法

  • 用于支持 perl 6
  • 建造更快的 perl
  • 建造更清洁的 perl

Parrot 的非官方说法

  • 普通快速的解释器引擎
  • Mono 人的苦恼
  • python 使用者的真正苦恼
  • 试图接管世界

Parrot 的指导目标

速度

  • 越快越好
  • 慢的解释器是无用的
  • 几乎不可能再快了
  • Parrot 自动调整到最大速度

易维护性

  • 努力让某些事件不可见
  • 努力执行一种代码标准
  • 开源工程(Open Source projects) 只有少数资源处于设计中
  • 几乎不能再更易维护了
  • 对于实际已完成的工程,维护总要消耗很多资源

长寿

  • 软件良好运行的时间总是超出你所想的
  • 设计时考虑长期的易扩展性总能推迟不可避免的重写
  • 如果你以后还会要看这个代码,那么最好现在就确信我们能看清它

易扩展性

  • 每个人都想用 C 增加点什么
  • 永远是还是不够快
  • 有很多外部库是值得一用的
  • 一个良好的扩充机制是件非常棒的事
  • 一个糟糕的机制真让人受罪

强适应性

  • 通解比特解有更多的长期潜能
  • 人们总是做些你意料不到的事
  • 如果现在就考虑强适应性,那以后花费的时间就少了
  • 即使现在花费的时间多一点也是值得的

多语言接受力

支持 perl

  • Perl 6 是所有开始的理由
  • Perl 拥有所有常见动态语言中最复杂的语义(semantics)
  • 随着语言发展,Perl 解释器的执行是主要的苦恼

支持常见的动态语言

  • 例如包括 Ruby 和 Python
  • 在编译时会出现某种不确定的语言
  • 实际上很琐碎,只需要大约考虑下

nice 目标

  • 如果我们能表达所有语义,这能让语言设计者更容易些
  • 许多人现在将 C 或 GCC 做为目标
  • 对于许多语言来说,Impedance 失谐会更低

天生能玩 Zork

  • 是的,我没说错
  • parrot -b:zmachine zork.dat
  • 比起正确地操作 Java, Python, or .NET 的字节码还要更难
  • 如果我们能正确操作这个,那么我们能操作任何人的字节码
  • 另外它真的非常 cool

解释器要素

软件中的 CPU

  • 在虚拟机(VM)中有一个“V”
  • 编译器作者能将解释器当成 CPU 芯片的一种形态
  • 允许并且使“核心”动作更容易定制
  • 有时候软件能当成硬件
  • 但是更多时候是不能

通常意义的强适应性

  • 当核心功能性已在软件中后,他们就可变了
  • 一般来说,软件比硬件对适应性的限制要少
  • 对大多数人来说,考虑软件比硬件更简单
  • 更改的边际成本更低

通常意义上的慢

  • 在用户的程序和硬件运行它之间有一间接层
  • 非常容易有 impedance 失谐问题
  • 也非常容易让速度逐渐消失

Easy to write for

  • 很容易掩盖困难部分
  • 解释器应该能用“我能被简单地使用”来描述自己
  • 你应该将它作为你的目标,如果不是,那你就做错了什么事情

Easy to write

  • 解释器是纯语义(semantics)的,而语义是容易表达的
  • 但是定义语义要花费长一点的时间
  • 一个简单的解释器能在一星期内装配起来
  • (但是 parrot 离“简单”还远呢)
  • 不管是不是很复杂,只需要一个 SMOP

容易移植

  • 一般来说,表达的语义大多数都是平台无关的
  • 这意味着特定平台的东西是独立的
  • 平台特有但未执行的东西能被仿效
  • 通常来说,真正需要对付的代码只有一小部分

Parrot 的核心概念

基于寄存器

  • 32 each 整数/Integer, 字符串/String, 浮点数/Float, 和 PMC(Parrot Magic Cookie的缩写)
  • 对于解释器来说,寄存器是一种命名了的临时的位置(slot)
  • 它以硬件的 CPUs 为模型
  • 一般来说,它比单纯的基于堆栈的模型更有效

语言无关

  • 的感觉
  • 我们不强迫用户非得使用 ASCII, Unicode, Latin-1, 或 EBCDIC
  • 不管数据的编码如何,引擎都有相应的处理工具
  • 这对于大部分是本地语言的文本处理很重要

高标准

  • 多线程,闭包,continuations, aggregate operations, multimethod dispatch all core elements
  • 不将困难的事物归入库
  • 使编译器作者更容易
  • 对于解释器执行来说,二进制(Bit)远大于烦恼

内省的

  • 代码能检查解释器的当前状态
  • 大多数,如果说不是全部,核心数据结构都是可获得的
  • 包括堆栈入口,符号表,词汇变量
  • 变量 arenas, 内存池和解释器数据也一样

易变的

  • 代码能被创建 on the fly
  • 库能被自动加载
  • 操作码(Opcodes)能被重定义 on the fly
  • 运行时,基础变量方法都可覆盖

多线程

  • 多线程是 Parrot 的一个核心要素
  • 多线程编译器是个有趣的任务
  • “有趣(interesting)” 的另一个不愉快的含义
  • 从一开始就要把它设计进去

With Continuations

  • Really odd Lisp thing
  • 有几分像控制流程的闭包
  • 参考:意外处理/exceptions是一种简单的扩建物/continuation
  • 它们毁坏你的大脑
  • 即使你的大脑已经死了,但是,它们能让你做些及其聪明的事

PPM

30 October 2004


  1. When online, 可在联机状态下调用PPM
    1. Run 'cmd', 然后 cd 到你的Perl安装目录下,它可能为C:/Perl,然后 cd bin
    2. 在此我们安装PDF模块,此模块不为ActivePerl自带。Type:
      ppm install PDF
  2. When offline, 但也想安装,Follow me:
    1. Visit http://ppm.activestate.com/PPMPackages/zips/, Download PDF.zip,用U盘或软盘载到自己的电脑,然后unzip至Perl安装目录下的bin目录里,目录里应该多了一个文件夹和两个文件:MSWin32-x86-multi-thread-5.8文件夹, PDF.ppd, README
    2. 运行 cmd ,cd Perl安装目录/bin。Type:
      ppm install PDF.ppd
    3. 安装完毕,可以删除那文件夹和那两文件了。
  3. 在安装过程中如果碰见问题,如
    Failed to load PPM_DAT file
    那问题很可能是Config.pm的路径不对,打开lib/Config.pm,搜索"installprivlib",然后全文替换将后面的路径改为你的Perl路径。这个问题很大程序上是我将C:/Perl目录直接改为C:/usr,而忘了改Config.pm
    如果您碰到的是其他问题,根据具体情况自行解决。

我的新闻组设置过程

30 October 2004


我也是今年刚使用新闻组。以前也碰过,不过大都是用Web浏览模式。Web浏览模式唯一不方便的是不能回帖。如果自己有台电脑,那用Outlook Express等软件来浏览新闻组是个不错的 idea。
如果是我设置的过程,抛砖引玉,希望各位爱上NewsGroup.
以Outlook Express为例:
  1. 工具 - 账户,添加 - 新闻...
  2. 新闻(NNTP)服务器可填 news.yaako.com , Finished.
  3. “是否从新的新闻服务器下载新闻组?”选“是”
  4. 下载完毕后,就可以在新闻组列表里订阅你喜欢的新闻组
  5. news.yaako.com 服务器可以直接答复新闻组,不需要经过验证
另推荐一服务器地址,此服务器组较多:freenews.netfront.net

如下是我所选的新闻组:

cn.bbs.comp.lang.perl
comp.lang.perl.misc
comp.lang.perl.modules
每天花一点工夫看看新闻组,所获还是颇丰的。

Module

30 October 2004


qw的解释

use Module qw(argu param);中qw的解释
qw中的内容在本语句执行后,便成为Module里的import函数的参数。例解如下:
#TEst.pm
package TEst;

sub import {
  print "@_";
}

1;

#test.pl
#!/usr/bin/perl
use TEst qw(argu param);

__OUTPUT__
TEst argu param
参考:Randal L. Schwartz's Hey, 'use' guys! Import this!

查找已安装模块

use ExtUtils::Installed;

my $inst = ExtUtils::Installed->new();
print join "\n", $inst->modules();

检查批量模块是否已安装


#!/usr/bin/perl
# Check the needed modules are installed in currect system
print "Content-Type: text/html\r\n\r\n";

while (<DATA>) {
    s/[^\w\:]*//g;
    next unless ($_);
    eval("use $_;");
    if ($@) {
        print "<font color=red>$_</font> failed to locate.<br>";
    } else {
        print "<font color=blue>$_</font> locate success!<br>";
    }
}

# add the module name you want to check
__DATA__
strict
CGI::Carp
CGI
Net::SMTP
TEM_P

LB Post Machine

30 October 2004


#!/usr/bin/perl
#Save this code as water.pl
#Usage:perl water.pl
#Author: [email protected]

#use strict;
use LWP::UserAgent;
#use HTTP::Cookies;
#use HTTP::Headers;
use HTTP::Request;
#use HTTP::Response;

#my $brower = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';

my $ua = new LWP::UserAgent;
#$ua->agent($brower);

#my $header = new HTTP::Headers;
#$header->content_type('application/x-www-form-urlencoded');

my $count = 1;
my $sleeptime = 1;

#按 Ctrl+c 退出
while (1) {
  #不重复标题内容的帖
  my $rands = rand(1000);
  #membername 注册的用户名, password 密码
  #forum 论坛id号, intopictitle 标题, inpost 内容
  my $content = 
"action=addnew&forum=1&membername=test&password=testtest&intopictitle=1313s.com-$rands&inpost=1313s.com-$rands";
  #http://192.168.0.44/*/post.cgi为目标地址
  my $request = new HTTP::Request('GET',"http://192.168.0.44/cgi-bin/lb/post.cgi?$content");

  #$header->content_length(length($content));
  #my $request = new HTTP::Request('POST',"http://192.168.0.44/cgi-bin/lb/post.cgi",$header,$content);

  my $response = $ua->request($request);

  #连接不成功,可能目标地址错误或者其他
  if (! $response->is_success) {
    die "failed reason: ", $response->status_line;
  }

  #返回的网页内容中有“成功”
  if ($response->content=~ /成功/) {
    print "success $count";
    $count++;
  }
  #查找灌水机制的时间,并调整 sleep time
  elsif ($response->content =~ /待.+?([0-9]+).+?秒/) {
    $sleeptime = $1;
    print "faied,reason is 灌水预防机制,and we changed the sleep time,Don't worry";
  } else {
    print "faied,reason unkown!";
  }
  #For 灌水预防机制
  print ",wait $sleeptime seconds\n";
  sleep $sleeptime;
}

Special

此代码发布只为研究学习之用,任何人不得用于非法行为。

Kwiki 安装笔记

30 October 2004


如果可以,不妨先试试最下面的那种最简单的方法。

下载最新版本的Kwiki,Kwiki的homepage: http://www.kwiki.org.
Kwiki的整个程序系统就是一个index.cgi然后调用Kwiki的模块。所有的Kwiki功能都有模块所成。这里暂时不讨论程序问题。讲讲我在Win32下的安装(其他OS类似)。
Kwiki程序还调用了Kwiki作者写的另两个模块,而它们没打包在下载来的Kwiki_*.tar.gz中。所以必须先安装这两个模块。IO::All,Spoon。与其他模块安装同,可以通过cpan安装。
模块安装完毕后,进入目录:

% perl Makefile.PL
% make test
% make install
注:Win32下为nmake,没有此程序的朋友可以参考Win32
上述三行将Kwiki的一系列模块加入perl目录的site/lib里。
我解压缩后的目录位置置于cgi-bin目录下,所有接下来执行README所说的:
% perl kwiki -new Kwiki
将其安装到Kwiki目录中。到此,简单的Kwiki安装完毕。可以通过http://localhost|url/cgi-bin/Kwiki/index.cgi访问。
不过这样的Kwiki功能薄弱,还需要安装一些plugins.visit http://www.kwiki.org/?KwikiPluginList.我所选择的plugin为Kwiki::UserPreferences, Kwiki::GuestBook, Kwiki::NewPage.其他plugin的安装方式相同。安装plugin非常简单,只需要安装其对应的模块就成了。安装方法与安装IO::All,Spoon一样。
安装完毕后,将解压缩包里的kwiki文件移到cgi-bin/Kwiki目录下,然后执行:
% perl kwiki -add Kwiki::UserPreferences Kwiki::GuestBook Kwiki::NewPage
到此Kwiki安装结束。

Added @ 2004/8/22

更新plugins的另一种方法,在http://www.kwiki.org上也有介绍。

  1. 安装plugin模块
  2. 在plugins文件里增加模块名称
  3. 执行 perl kwiki -update
follows is my plugins:
Kwiki::Display
Kwiki::Edit
Kwiki::Htaccess
Kwiki::Theme::Basic
Kwiki::Toolbar
Kwiki::Status
Kwiki::Widgets
Kwiki::UserName
Kwiki::GuestBook
Kwiki::UserPreferences
Kwiki::NewPage
Kwiki::Search
Kwiki::Archive::Rcs
Kwiki::RecentChanges
Kwiki::Revisions

Added @ 2004/11/9

重装了次系统。发现只用
cpan Kwiki
就可以了。它会自动查找关联模块。并在Perl/bin目录下生成kwiki.bat与kwiki文件。
安装完成后,cmd进入console. 比如我安装在E:\Fayland\cgi-bin\Kwiki目录下。
E:\Fayland>kwiki -new cgi-bin\Kwiki

如何寻找用户真实IP

30 October 2004


Generally

一般程序代码所书写的都是这样(如 ikonboard, UBB, UltraThreads etc.):
$IP_ADDRESS = $ENV{'HTTP_X_FORWARDED_FOR'} || $ENV{'REMOTE_ADDR'};
    这里有三种情况:
  • 通过透明代理服务器,那$ENV{'HTTP_X_FORWARDED_FOR'}为此代理服务器地址,而$ENV{'REMOTE_ADDR'}为你的地址。
  • 通过匿名代理服务器,同样$ENV{'HTTP_X_FORWARDED_FOR'}为此代理服务器地址,而$ENV{'REMOTE_ADDR'}为此代理服务器连接网页的地址(可与代理服务器地址不同)。
  • 没有使用代理服务器,那$ENV{'HTTP_X_FORWARDED_FOR'}为空,而$ENV{'REMOTE_ADDR'}为用户真实ip
举个例子会清晰点:
#proxy(anonymous) is 80.59.189.28,then got
$ENV{'REMOTE_ADDR'} = '80.58.3.235';
$ENV{'HTTP_X_FORWARDED_FOR'} = '80.59.189.28';
#no proxy
$ENV{'REMOTE_ADDR'} = '211.90.227.119';#my temp true ip
$ENV{'HTTP_X_FORWARDED_FOR'} = '';

Advance

其实另外还有两个ENV环境变量:'HTTP_CLIENT_IP', 'X_CLIENT_IP'。不过不太常用。如果增加此两变量,代码如下,前三者顺序可能有变(如 YaBB, X-Forum etc.):
$IP_ADDRESS = $ENV{'HTTP_X_FORWARDED_FOR'} || $ENV{'HTTP_CLIENT_IP'} || $ENV{'X_CLIENT_IP'} || $ENV{'REMOTE_ADDR'};

Especial

不适用于localhost/127.0.0.1
对于匿名/Anonymous Proxy来说,因为它不会将使用Proxy的人的ip一起发请求过来,所以获得使用Proxy人的ip是impossible mission.
获得Proxy相关信息的环境变量是$ENV{'HTTP_VIA'}.


Gallery缩略图制作

30 October 2004


描述

此文档用以生成某目录下所有图片的缩略图,并可生成索引/index

先决条件

use Image::Magick;

Script


#!/usr/bin/perl -w
use strict;
use CGI qw/:standard/;
use CGI::Carp qw(fatalsToBrowser);
use Image::Magick;

#settings
my $thumbs_width = 100; #thumbs's width
my $thumbs_height = 100; #thumbs's height
my $dir = 'E:/Fayland/Gallery'; #the to-do dir
my $url = 'http://localhost/Gallery'; #the relevant url
my $need_index = 1; # 1 - produce the index.html, 0 - ingore

my $query = new CGI;
print $query->header;

my @gallery;
(-d "$dir") or die "Cann't find $dir";
opendir(DIR,"$dir");
@gallery = readdir(DIR);
closedir(DIR);
@gallery = grep(/(gif|jpe?g|png)$/, @gallery);

my $i;
open(FH, ">$dir/index.html") if ($need_index);
foreach (@gallery) {
    $_ =~ m/(.*).(.*)/;
    my $thumbsfile = "$1.png";
    unless (-e "$dir/thumbs/$thumbsfile") {
        my $image=Image::Magick->new;
        my $x = $image->Read("$dir/$_");
        warn "$x" if "$x";
        $x = $image->Resize(width=>$thumbs_width, height=>$thumbs_height);
        $x = $image->Write("$dir/thumbs/$thumbsfile");
    }
    print "<a href='$url/$_'><img src='$url/thumbs/$thumbsfile' width='100' height='100' border='0' alt='$_' /></a>";
    print FH "<a href='$_'><img src='thumbs/$thumbsfile' width='100' height='100' border='0' alt='$_' /></a>" if ($need_index);
    $i++;
    if ($i % 5 == 0) {
        print "<br>";
        print FH "<br>" if ($need_index);
    }
}
close(FH) if ($need_index);

print "<p>Visit the <a href='$url/index.html'>IndexPage</a></p>" if ($need_index);

License

The same as Perl.

When perl is not quite fast enough

30 October 2004


Original URL:http://www.ccl4.org/~nick/P/Fast_Enough/


When perl is not quite fast enough

此份手稿是为了我在 YAPC::EU::2002 上的讲演而准备的。它不是我实际上所说讲演的抄本,更合适的说法是,它只是转成幻灯片的 pod 笔记,一些随手写在纸上或存在脑中的笔记,我试着将它们做成连贯的文章。我还试着增加一些有用的反馈 - 有时我能记起这是谁说的,在此感谢他们。

您可以按此查看幻灯片,我希望找到幻灯片做了哪些改变是显而易见的。

简介

你可能有些 Perl 程序,但是它们运行得很慢。然后你很想对此做些改变。此文讲述了怎样对程序提速和怎样及时地避免问题。

一些明显的事

寻找更好的算法
您的程序运行在您所想的最有效的方法之上。但是很有可能别人从另一个完全不同的角度看问题,从而找到一个快 100 倍的算法。您确定你使用了最好的算法?做些研究吧。
使用更好的硬件
如果您的程序不是在许多台机子上运行,或许使用更好的硬件会比较便宜。毕竟硬件是越来越便宜而程序员要付高工资。或许你能通过改变硬件而获得高性能;也有可能给您的机子编译一个自定义内核就足够了。
mod_perl
对于我所写的 CGI 程序,我发现即使我削减了所有可以削减的,服务器还是只能每秒供应 2.5 。而运行在同一机子上同样的代码,不同的是使用了 mod_perl ,它能每秒供应 25 。这是一个不费很大力气就能做 10 倍提速的方法。如果您的代码不适合于运行在 mod_perl 下,这里还有 fastcgi (CGI.pm 支持)。如果您的程序不是 CGI ,可以考虑持续性的 perl daemon,见 PPerl on CPAN
重写于 C, 或 C++, sorry Java, I mean C#, oops no ...
当然,最后一个“显而易见”的解决方案就是用 native 语言重写您的代码,如 C, C++, Java, C# 或者任何当前合适的。

但是这些可能不是实际或策略上可接受的方案。

某些折衷

那么,你能做某些折衷。

XS
您可能发现 5% 的代码使用了 95% 的时间,这些时间 Perl 用来做它低效的工作,如移动位/bit shifting。因此你能用 C 来写那 bit ,用 Perl 写剩下的,最后用 XS 将二者联合起来。但是你需要学习 XS 和 perl API,这是份艰巨的工作。
Inline
或许你能使用 Inline。如果你要操作 Perl 内部,你仍然需要学习 perl API,但是如果你只需要在您的纯 C 代码或他人的 C 库中调用 perl,Inline 让此轻而易举。

如下是我的 perl 程序,它调用了 perl 函数 rot32。而这里是一个 C 函数 rot32 ,它接受两个整数,用第二个整数旋转第一个,然后返回一个整数值。这是你所需的,它能工作得很好。

    #!/usr/local/bin/perl -w
    use strict;
    
    printf "$_:\t%08X\t%08X\n", rot32 (0xdead, $_), rot32 (0xbeef, -$_)
      foreach (0..31);
    
    use Inline C => <<'EOC';
    
    unsigned rot32 (unsigned val, int by) {
      if (by >= 0)
        return (val >> by) | (val << (32 - by));
      return (val << -by) | (val >> (32 + by));
    }
    EOC
    __END__
    0:      0000DEAD        0000BEEF
    1:      80006F56        00017DDE
    2:      400037AB        0002FBBC
    3:      A0001BD5        0005F778
    4:      D0000DEA        000BEEF0
    ...
编译您自己的 perl ?
您是否用系统提供的 perl 运行程序。编译自己的 perl 能让程序跑得更快。例如,当线程编译进 perl ,所有它的内部变量都要满足线程安全/thread safe,这能让程序变慢一点。如果 perl 可以使用线程,但你不使用线程,于是你毫无理由地付出了一点速度。同样地,你能有个比系统使用的更好的编译器。例如,我发现使用 gcc 3.2 比 2.9.5 某些我的 C 代码快了 5% 。[某个对我有益的激烈质问者/hecklers说他提升了 14% ,(如果我记忆无误的话)这是从重编译 perl 编译器所获得的]
不同的 perl 版本?
试着采用不同的 perl 版本。不同的版本对某些事情有提速。如果你使用了一个旧的 perl ,试着用最新版本。如果你运行最新版本而没使用这些最新的特性,那试着使用一个旧版本。

放逐愚蠢的魔鬼/Banish the demons of stupidity

您是否使用了这种语言最棒的特性?

hashes
引用 Larry Wall 的话 - 在关联数组中使用线性搜索就像试着联合某人对付 loaded Uzi 。

我相信你不会这么做。但是你是否保留您的数组已排序,这样你能使用对分检索?这很快。但是使用 hash 应该会更快。

regexps
没有正则表达式/regexps的语言需要你另外写代码来解析字符串。perl 有 regexps ,又它们重写或许能获得 10 倍加速。甚至使用锚标\G和标记/gc可能会更快。
    if ( /\G.../gc ) {
        ...
    } elsif ( /\G.../gc ) {
        ...
    } elsif ( /\G.../gc ) {
packunpack
pack 和 unpack 有太多的特性值得去记住。翻看 manpage - 你可能只用一个 unpack 代替整个子程序。
undef
undef. 什么是我意味的 undef?

您是否只为了丢弃它而计算?

For example the script in the Encode module that compiles character conversion tables would print out a warning if it saw the same character twice. If you or I build perl we'll just let those build warnings scroll off the screen - we don't care - we can't do anything about it. And it turned out that keeping track of everything needed to generate those warnings was slowing things down considerably. So I added a flag to disable that code, and perl 5.8 defaults to use it, so it builds more quickly.

间断

许多对我有益的激烈质问者/hecklers (大部分是看过讲演的 London.pm (我将 David Adler 做为 London.pm 的一分子因为他订阅了这份列表))希望我提醒各位除非你确实绝对的需要,否则真的不应该去优化。您在让您的代码更难维修,更难扩展和更易引进新的 bugs 。很可能在一开始你就在需要优化的地方做了些不该做的事。

我同意上述观点。

同样,我也不想改变幻灯片的运行顺序。There isn't a good order to try to describe things in, and some of the ideas that follow are actually more "good practice" than optimisation techniques, so possibly ought to come before the slides on finding slowness. I'll mark what I think are good habits to get into, and once you understand the techniques then I'd hope that you'd use them automatically when you first write code. That way (hopefully) your code will never be so slow that you actually want to do some of the brute force optimising I describe here.

测试

必须不引进新的 bugs
当你优化现有运行代码最重要的是不引进新的 bugs 。
使用您所有的回归/regression测试 :-)
尽可能使用您所有的一整套 regression 测试。您应该有,对不?

[这时候你们或许会不安地笑,因为我打赌只有非常少的人能写份全面的测试代码]

保留原始程序的备份
应该保留一份原始代码的拷贝。如果其他的都失败了这是您最后的凭借。试着使用版本控制系统/CSV。做一个脱机备份。检查您的备份是否可读。你得保证不能丢失它。
最后,您对于优化时是否引进了新的 bugs 的最终测试为检查优化后的版本是否与原先版本有同样的输出结果。(当然,优化版本花费更少的时间)。

What causes slowness

CPU
It's obvious that if you script hogs the CPU for 10 seconds solid, then to make it go faster you'll need to reduce the CPU demand.
RAM
A lesser cause of slowness is memory.
perl trades RAM for speed
One of the design decisions Larry made for perl was to trade memory for speed, choosing algorithms that use more memory to run faster. So perl tends to use more memory.
getting slower (relative to CPU)
CPUs keep getting faster. Memory is getting faster too. But not as quickly. So in relative terms memory is getting slower. [Larry was correct to choose to use more memory when he wrote perl5 over 10 years ago. However, in the future CPU speed will continue to diverge from RAM speed, so it might be an idea to revisit some of the CPU/RAM design trade offs in parrot]
memory like a pyramid

You can never have enough memory, and it's never fast enough.

Computer memory is like a pyramid. At the point you have the CPU and its registers, which are very small and very fast to access. Then you have 1 or more levels of cache, which is larger, close by and fast to access. Then you have main memory, which is quite large, but further away so slower to access. Then at the base you have disk acting as virtual memory, which is huge, but very slow.

Now, if your program is swapping out to disk, you'll realise, because the OS can tell you that it only took 10 seconds of CPU, but 60 seconds elapsed, so you know it spent 50 seconds waiting for disk and that's your speed problem. But if your data is big enough to fit in main RAM, but doesn't all sit in the cache, then the CPU will keep having to wait for data from main RAM. And the OS timers I described count that in the CPU time, so it may not be obvious that memory use is actually your problem.

This is the original code for the part of the Encode compiler (enc2xs) that generates the warnings on duplicate characters:

    if (exists $seen{$uch}) {
        warn sprintf("U%04X is %02X%02X and %02X%02X\n",
                     $val,$page,$ch,@{$seen{$uch}});
    }
    else {
        $seen{$uch} = [$page,$ch];
    }

It uses the hash %seen to remember all the Unicode characters that it has processed. The first time that it meets a character it won't be in the hash, the exists is false, so the else block executes. It stores an arrayref containing the code page and character number in that page. That's three things per character, and there are a lot of characters in Chinese.

If it ever sees the same Unicode character again, it prints a warning message. The warning message is just a string, and this is the only place that uses the data in %seen. So I changed the code - I pre-formatted that bit of the error message, and stored a single scalar rather than the three:

    if (exists $seen{$uch}) {
        warn sprintf("U%04X is %02X%02X and %04X\n",
                     $val,$page,$ch,$seen{$uch});
    }
    else {
        $seen{$uch} = $page << 8 | $ch;
    }

That reduced the memory usage by a third, and it runs more quickly.

Step by step

How do you make things faster? Well, this is something of a black art, down to trial and error. I'll expand on aspects of these 4 points in the next slides.

哪些是慢的?
你必须找到那些使真正慢的东西。在已足够快的代码上浪费精力是不值得的 - 将其放在能获得最大回报的代码上吧。
考虑重写
不管你怎么咒骂它们,并非所有的代码都能变得快起来。所以剩下真正能加速的是找出另一种能完成同样工作的方法。
尝试
但是这或许不可行。检查它是更快的和给出了同样的结果。
Note results
Either way, note your results - I find a comment in the code is good. It's important if an idea didn't work, because it stops you or anyone else going back and trying the same thing again. And it's important if a change does work, as it stops someone else (such as yourself next month) tidying up an important optimisation and losing you that hard won speed gain.

By having commented out slower code near the faster code you can look back and get ideas for other places you might optimise in the same way.

简单的小事

下面是一些我认为有用的习惯,所以您应当将它们运用到日常程序中。

AutoSplitAutoLoader
如果您在写模块,使用模块 AutoSplitAutoLoader 可以使 perl 只加载那些特定程序实际所用到的模块部分。如此您能获得两个好处 - 在开始加载时不浪费 CPU 去倒入您不用的模块部分,还有当代码编译时不浪费 RAM 去保持着编译的结构。所以您的模块能加载更快,和使用更少的 RAM 。

一个可能的问题是使用 AutoLoader 让子程序带来调试混乱。当处于测试状态,您能通过在自动加载子程序前添加注释 __END__ 来使 AutoLoader 失效。如此一来,它们就普通地被加载,编译和测试了。

  ...
  1;
  # While debugging, disable AutoLoader like this:
  # __END__
  ...

当然,为了使 use 正常,您还得在加载程序段的后面另添加 1; 和可能要另一个 __END__

Schwern notes that commenting out __END__ can cause surprises if the main body of your module is running under use strict; because now your AutoLoaded subroutines will suddenly find themselves being run under use strict. This is arguably a bug in the current AutoSplit - when it runs at install time to generate the files for AutoLoader to use it doesn't add lines such as use strict; or use warnings; to ensure that the split out subroutines are in the same environment as was current at the __END__ statement. This may be fixed in 5.10.

Elizabeth Mattijsen notes that there are different memory use versus memory shared issues when running under mod_perl, with different optimal solutions depending on whether your apache is forking or threaded.

=pod @ __END__
如果您用了一大块 pod 来描述你的代码,那么请尽量不要将其放在文件的上面部分。虽然 perl 分析器能很快的跳过 pod ,但是这不是魔法,它还是需要一点时间的。此外,它也需要从磁盘中读入 pod 就为了忽略它。
  #!perl -w
  use strict;
  =head1 You don't want to do that
  big block of pod
  =cut
  ...
  1;
  __END__
  =head1 You want to do this

如果您将您的代码放在 __END__ 后面,那么 perl 分析器就不会去注意它。这能省下一点点 CPU,但是如果你有一块很大的 pod (>4K) ,那它意味着文件的最后磁盘将不会被读进 RAM 。这也许能获得某些加速。[A helpful heckler observed that modern raid systems may well be reading in 64K chunks, and modern OSes are getting good at read ahead, so not reading a block as a result of =pod @ __END__ may actually be quite rare.]

如果你还是将您的 pod (和测试)放在函数代码的旁边(这看起来更是一种好习惯),那么此建议与您无关。

无关的倒入会变慢

Exporter 是用 perl 所写的。虽然它很快,但也不是即时的。

许多模块,为了节省您的输入,都默认在您的命名空间内倒出许多函数和符号变量。如果您只有一个参数在 use 后(模块名参数),比如

    use POSIX;          # Exports all the defaults

于是 POSIX 将有用地在您的命名空间内倒出它的默认符号变量列表。如果您在模块名后有一列表,那它只倒出此列表的符号变量。如果列表为空, 不到处任何符号变量:

    use POSIX ();       # Exports nothing.

您仍然可以使用所有的函数和其他符号变量 - 但您必须使用它们的全名,如在前面输入 POSIX:: 。许多人说这样实际上让您的代码更干净,而且现在很清楚的知道子程序是在哪定义的。除了这些,它还更快:

use POSIX; use POSIX ();
0.516s 0.355s
use Socket; use Socket ();
0.270s 0.231s

POSIX 默认倒出一大堆符号变量。如果您使用了不倒出,它在开始就 少 30% 的时间Socket 能少 15% 的时间。

regexps

avoid $&
The $& variable returns the last text successfully matched in any regular expression. It's not lexically scoped, so unlike the match variables $1 etc it isn't reset when you leave a block. This means that to be correct perl has to keep track of it from any match, as perl has no idea when it might be needed. As it involves taking a copy of the matched string, it's expensive for perl to keep track of. If you never mention $&, then perl knows it can cheat and never store it. But if you (or any module) mentions $& anywhere then perl has to keep track of it throughout the script, which slows things down. So it's a good idea to capture the whole match explicitly if that's what you need.
    $text =~ /.* rules/;
    $line = $&;                 # Now every match will copy $& - slow
    $text =~ /(.* rules)/;
    $line = $1;                 # Didn't mention $& - fast
avoid use English;
use English gives helpful long names to all the punctuation variables. Unfortunately that includes aliasing $& to $MATCH which makes perl think that it needs to copy every match into $&, even if you script never actually uses it. In perl 5.8 you can say use English '-no_match_vars'; to avoid mentioning the naughty "word", but this isn't available in earlier versions of perl.
avoid needless captures
Are you using parentheses for capturing, or just for grouping? Capturing involves perl copying the matched string into $1 etc, so it all you need is grouping use a the non-capturing (?:...) instead of the capturing (...).
/.../o;
If you define scalars with building blocks for your regexps, and then make your final regexp by interpolating them, then your final regexp isn't going to change. However, perl doesn't realise this, because it sees that there are interpolated scalars each time it meets your regexp, and has no idea that their contents are the same as before. If your regexp doesn't change, then use the /o flag to tell perl, and it will never waste time checking or recompiling it.
but don't blow it
You can use the qr// operator to pre-compile your regexps. It often is the easiest way to write regexp components to build up more complex regexps. Using it to build your regexps once is a good idea. But don't screw up (like parrot's assemble.pl did) by telling perl to recompile the same regexp every time you enter a subroutine:
    sub foo {
        my $reg1 = qr/.../;
        my $reg2 = qr/... $reg1 .../;

You should pull those two regexp definitions out of the subroutine into package variables, or file scoped lexicals.

Devel::DProf

You find what is slow by using a profiler. People often guess where they think their program is slow, and get it hopelessly wrong. Use a profiler.

Devel::DProf is in the perl core from version 5.6. If you're using an earlier perl you can get it from CPAN.

You run your program with -d:DProf

    perl5.8.0 -d:DProf enc2xs.orig -Q -O -o /dev/null ...

which times things and stores the data in a file named tmon.out. Then you run dprofpp to process the tmon.out file, and produce meaningful summary information. This excerpt is the default length and format, but you can use options to change things - see the man page. It also seems to show up a minor bug in dprofpp, because it manages to total things up to get 106%. While that's not right, it doesn't affect the explanation.

    Total Elapsed Time = 66.85123 Seconds
      User+System Time = 62.35543 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     106.   66.70 102.59 218881   0.0003 0.0005  main::enter
     49.5   30.86 91.767      6   5.1443 15.294  main::compile_ucm
     19.2   12.01  8.333  45242   0.0003 0.0002  main::encode_U
     4.74   2.953  1.078  45242   0.0001 0.0000  utf8::unicode_to_native
     4.16   2.595  0.718  45242   0.0001 0.0000  utf8::encode
     0.09   0.055  0.054      5   0.0109 0.0108  main::BEGIN
     0.01   0.008  0.008      1   0.0078 0.0078  Getopt::Std::getopts
     0.00   0.000 -0.000      1   0.0000      -  Exporter::import
     0.00   0.000 -0.000      3   0.0000      -  strict::bits
     0.00   0.000 -0.000      1   0.0000      -  strict::import
     0.00   0.000 -0.000      2   0.0000      -  strict::unimport

At the top of the list, the subroutine enter takes about half the total CPU time, with 200,000 calls, each very fast. That makes it a good candidate to optimise, because all you have to do is make a slight change that gives a small speedup, and that gain will be magnified 200,000 times. [It turned out that enter was tail recursive, and part of the speed gain I got was by making it loop instead]

Third on the list is encode_U, which with 45,000 calls is similar, and worth looking at. [Actually, it was trivial code and in the real enc2xs I inlined it]

utf8::unicode_to_native and utf8::encode are built-ins, so you won't be able to change that.

Don't bother below there, as you've accounted for 90% of total program time, so even if you did a perfect job on everything else, you could only make the program run 10% faster.

compile_ucm is trickier - it's only called 6 times, so it's not obvious where to look for what's slow. Maybe there's a loop with many iterations. But now you're guessing, which isn't good.

One trick is to break it into several subroutines, just for benchmarking, so that DProf gives you times for different bits. That way you can see where the juicy bits to optimise are.

Devel::SmallProf should do line by line profiling, but every time I use it it seems to crash.

Benchmark

现在您已经确定了慢的地方,您还需要在两个代码中找到更快的一个。模块 Benchmark 能让这变得容易。一个特别好的子程序是 cmpthese ,它能摘录代码片段而且绘制图表。cmpthese 可于 perl 5.6 的 Benchmark 里发现。

因为为了比较各运行 1000 次的两个代码片段 orignew ,您可以这么做

    use Benchmark ':all';
    
    sub orig {
       ...
    }
    
    sub new {
       ...
    }
    
    cmpthese (10000, { orig => \&orig, new => \&new } );

Benchmark 运行了两者,分别计时,然后输出一个有用的对比图表:

    Benchmark: timing 10000 iterations of new, orig...
           new:  1 wallclock secs ( 0.70 usr +  0.00 sys =  0.70 CPU) @ 14222.22/s (n=10000)
          orig:  4 wallclock secs ( 3.94 usr +  0.00 sys =  3.94 CPU) @ 2539.68/s (n=10000)
            Rate orig  new
    orig  2540/s   -- -82%
    new  14222/s 460%   --

很容易看清楚新的代码比原始代码快 4 倍。

What causes slowness in perl?

Actually, I didn't tell the whole truth earlier about what causes slowness in perl. [And astute hecklers such as Philip Newton had already told me this]

When perl compilers your program it breaks it down into a sequence of operations it must perform, which are usually referred to as ops. So when you ask perl to compute $a = $b + $c it actually breaks it down into these ops:

Computers are fast at simple things like addition. But there is quite a lot of overhead involved in keeping track of "which op am I currently performing" and "where is the next op", and this book-keeping often swamps the time taken to actually run the ops. So often in perl it's the number of ops your program takes to perform its task that is more important than the CPU they use or the RAM it needs. The hit list is

  1. Ops
  2. CPU
  3. RAM

So what were my example code snippets that I Benchmarked?

It was code to split a line of hex (54726164696e67207374796c652f6d61) into groups of 4 digits (5472 6164 696e ...) , and convert each to a number

    sub orig {
       map {hex $_} $line =~ /(....)/g;
    }
    sub new {
       unpack "n*", pack "H*", $line;
    }

The two produce the same results:


orig new
21618, 24932, 26990, 26400, 29556, 31084, 25903, 28001, 26990, 29793, 26990, 24930, 26988, 26996, 31008, 26223, 29216, 29552, 25957, 25646 21618, 24932, 26990, 26400, 29556, 31084, 25903, 28001, 26990, 29793, 26990, 24930, 26988, 26996, 31008, 26223, 29216, 29552, 25957, 25646

but the first one is much slower. Why? Following the data path from right to left, it starts well with a global regexp, which is only one op and therefore a fast way to generate a list of the 4 digit groups. But that map block is actually an implicit loop, so for each 4 digit block it iterates round and repeatedly calls hex. Thats at least one op for every list item.

Whereas the second one has no loops in it, implicit or explicit. It uses one pack to convert the hex temporarily into a binary string, and then one unpack to convert that string into a list of numbers. n is big endian 16 bit quantities. I didn't know that - I had to look it up. But when the profiler told me that this part of the original code was a performance bottleneck, the first think that I did was to look at the the pack docs to see if I could use some sort of pack/unpack as a speedier replacement.

Ops are bad, m'kay

You can ask perl to tell you the ops that it generates for particular code with the Terse backend to the compiler. For example, here's a 1 liner to show the ops in the original code:

$ perl -MO=Terse -e'map {hex $_} $line =~ /(....)/g;'

    LISTOP (0x16d9c8) leave [1]
        OP (0x16d9f0) enter
        COP (0x16d988) nextstate
        LOGOP (0x16d940) mapwhile [2]
            LISTOP (0x16d8f8) mapstart
                OP (0x16d920) pushmark
                UNOP (0x16d968) null
                    UNOP (0x16d7e0) null
                        LISTOP (0x115370) scope
                            OP (0x16bb40) null [174]
                            UNOP (0x16d6e0) hex [1]
                                UNOP (0x16d6c0) null [15]
                                    SVOP (0x10e6b8) gvsv  GV (0xf4224) *_
                PMOP (0x114b28) match /(....)/
                    UNOP (0x16d7b0) null [15]
                        SVOP (0x16d700) gvsv  GV (0x111f10) *line

At the bottom you can see how the match /(....)/ is just one op. But the next diagonal line of ops from mapwhile down to the match are all the ops that make up the map. Lots of them. And they get run each time round map's loop. [Note also that the {}s mean that map enters scope each time round the loop. That not a trivially cheap op either]

Whereas my replacement code looks like this:

$ perl -MO=Terse -e'unpack "n*", pack "H*", $line;'

    LISTOP (0x16d818) leave [1]
        OP (0x16d840) enter
        COP (0x16bb40) nextstate
        LISTOP (0x16d7d0) unpack
            OP (0x16d7f8) null [3]
            SVOP (0x10e6b8) const  PV (0x111f94) "n*"
            LISTOP (0x115370) pack [1]
                OP (0x16d7b0) pushmark
                SVOP (0x16d6c0) const  PV (0x111f10) "H*"
                UNOP (0x16d790) null [15]
                    SVOP (0x16d6e0) gvsv  GV (0x111f34) *line

There are less ops in total. And no loops, so all the ops you see execute only once. :-)

[My helpful hecklers pointed out that it's hard to work out what an op is. Good call. There's roughly one op per symbol (function, operator, variable name, and any other bit of perl syntax). So if you golf down the number of functions and operators your program runs, then you'll be reducing the number of ops.]

[These were supposed to be the bonus slides. I talked to fast (quelle surprise) and so manage to actually get through the lot with time for questions]

Memoize

Caches function results
MJD's Memoize follows the grand perl tradition by trading memory for speed. You tell Memoize the name(s) of functions you'd like to speed up, and it does symbol table games to transparently intercept calls to them. It looks at the parameters the function was called with, and uses them to decide what to do next. If it hasn't seen a particular set of parameters before, it calls the original function with the parameters. However, before returning the result, it stores it in a hash for that function, keyed by the function's parameters. If it has seen the parameters before, then it just returns the result direct from the hash, without even bothering to call the function.
For functions that only calculate
This is useful for functions that calculate things with no side effects, slow functions that you often call repeatedly with the same parameters. It's not useful for functions that do things external to the program (such as generating output), nor is it good for very small, fast functions.
Can tie cache to a disk file
The hash Memoize uses is a regular perl hash. This means that you can tie the hash to a disk file. This allows Memoize to remember things across runs of your program. That way, you could use Memoize in a CGI to cache static content that you only generate on demand (but remember you'll need file locking). The first person who requests something has to wait for the generation routine, but everyone else gets it straight from the cache. You can also arrange for another program to periodically expire results from the cache.

As of 5.8 Memoize module has been assimilated into the core. Users of earlier perl can get it from CPAN.

Miscellaneous

These are quite general ideas for optimisation that aren't particularly perl specific.

Pull things out of loops
perl's hash lookups are fast. But they aren't as fast as a lexical variable. enc2xs was calling a function each time round a loop based on a hash lookup using $type as the key. The value of $type didn't change, so I pulled the lookup out above the loop into a lexical variable:
    my $type_func = $encode_types{$type};

and doing it only once was faster.

Experiment with number of arguments
Something else I found was that enc2xs was calling a function which took several arguments from a small number of places. The function contained code to set defaults if some of the arguments were not supplied. I found that the way the program ran, most of the calls passed in all the values and didn't need the defaults. Changing the function to not set defaults, and writing those defaults out explicitly where needed bought me a speed up.
Tail recursion
Tail recursion is where the last thing a function does it call itself again with slightly different arguments. It's a common idiom, and some languages can automatically optimise it away. Perl is not one of those languages. So every time a function tail recurses you have another subroutine call [not cheap - Arthur Bergman notes that it is 10 pages of C source, and will blow the instruction cache on a CPU] and re-entering that subroutine again causes more memory to be allocated to store a new set of lexical variables [also not cheap].

perl can't spot that it could just throw away the old lexicals and re-use their space, but you can, so you can save CPU and RAM by re-writing your tail recursive subroutines with loops. In general, trying to reduce recursion by replacing it with iterative algorithms should speed things up.

yay for y

y, or tr, is the transliteration operator. It's not as powerful as the general purpose regular expression engine, but for the things it can do it is often faster.

tr/!// # 计算字符次数的更快的方法
tr 并不删除字符除非你使用了 /d 。如果你没有任何代替字符操作,它使用了只读。在标量上下文中,它返回匹配字符的次数。这是一种最快速的方法来计算单字符或字符串被匹配的次数。(它比在列表上下文上使用 m/.../g 要快。但是如果你只是检查某字符是否出现一次以上,那请使用 m/.../ ,因为它会在第一次检查到就停止,而 tr/// 会检查到最后)
tr/q/Q/ 比 s/q/Q/g 更快
tr is also faster than the regexp engine for doing character-for-character substitutions.
tr/a-z//d 比 s/[a-z]//g 更快
tr is faster than the regexp engines for doing character range deletions. [When writing the slide I assumed that it would be faster for single character deletions, but I Benchmarked things and found that s///g was faster for them. So never guess timings; always test things. You'll be surprised, but that's better than being wrong]

Ops are bad, m'kay

Another example lifted straight from enc2xs of something that I managed to accelerate quite a bit by reducing the number of ops run. The code takes a scalar, and prints out each byte as \x followed by 2 digits of hex, as it's generating C source code:

    #foreach my $c (split(//,$out_bytes)) {
    #  $s .= sprintf "\\x%02X",ord($c);
    #}
    # 9.5% faster changing that loop to this:
    $s .= sprintf +("\\x%02X" x length $out_bytes), unpack "C*", $out_bytes;

The original makes a temporary list with split [not bad in itself - ops are more important than CPU or RAM] and then loops over it. Each time round the loop it executes several ops, including using ord to convert the byte to its numeric value, and then using sprintf with the format "\\x%02X" to convert that number to the C source.

The new code effectively merges the split and looped ord into one op, using unpack's C format to generate the list of numeric values directly. The more interesting (arguably sick) part is the format to sprintf, which is inside +(...). You can see from the .= in the original that the code is just concatenating the converted form of each byte together. So instead of making sprintf convert each value in turn, only for perl ops to stick them together, I use x to replicate the per-byte format string once for each byte I'm about to convert. There's now one "\\x%02X" for each of the numbers in the list passed from unpack to sprintf, so sprintf just does what it's told. And sprintf is faster than perl ops.

How to make perl fast enough

use the language's fast features
You have enormous power at your disposal with regexps, pack, unpack and sprintf. So why not use them?

All the pack and unpack code is implemented in pure C, so doesn't have any of the book-keeping overhead of perl ops. sprintf too is pure C, so it's fast. The regexp engine uses its own private bytecode, but it's specially tuned for regexps, so it runs much faster than general perl code. And the implementation of tr has less to do than the regexp engine, so it's faster.

For maximum power, remember that you can generate regexps and the formats for pack, unpack and sprintf at run time, based on your data.

give the interpreter hints
Make it obvious to the interpreter what you're up to. Avoid $&, use (?:...) when you don't need capturing, and put the /o flag on constant regexps.
less OPs
Try to accomplish your tasks using less operations. If you find you have to optimise an existing program then this is where to start - golf is good, but remember it's run time strokes not source code strokes.
less CPU
Usually you want to find ways of using less CPU.
less RAM
but don't forget to think about how your data structures work to see if you can make them use less RAM.