论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2016-08-04 17:28 |只看该作者 |倒序浏览

刚开始的思路是：
将整个文件读取，然后按照空格切割后保存于数组中，然后遍历数组创建哈希表。但是如果文章很长，并且有多个文章的话，
先保存数组有点不太妥，效率太低，请问如何改进，使得当读入文件的时候不创建临时数组直接创建哈希表呢？
text_in:
The U.N. Food and Agriculture Organization says it has less than half the funding it needs to help ensure food security in parts of South Sudan.
.......
(太多先不贴出来了，假设文本很规范)

创建如下的哈希表%Words:
(
The => 1,
U.N. => 1,
Food => 1,
...
)

我之前的想法是：
my $content;

{
local $/= undef;
$content = <$IN1>;
close($IN1);
#print "$content\n";
}

my @words1 = split /\s/,$content;
my %Words1 = map{$_ => 1} @words1;

可不可以不用临时的数组呢，直接创建哈希表，那样会不会更快呢？

文库|博客

sunzhiguolu

巨富豪门

论坛徽章:: 307

2楼 [报告]

发表于 2016-08-04 18:34 |只看该作者

perl -anle '{$h{$_}++ for @F}END{$,=",";print keys %h}' f

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

104359176

丰衣足食

求职 : 软件工程师

论坛徽章:: 3

3楼 [报告]

发表于 2016-08-04 22:06 |只看该作者

use local is easy to slurp all text to a string. not related with speed.

If you want more rapid, use array and uniq it.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

jason680

富可敌国

论坛徽章:: 145

4楼 [报告]

发表于 2016-08-04 23:20 |只看该作者

本帖最后由 jason680 于 2016-08-05 11:02 编辑

回复 1# 大山里出来的孩子

$ perl words.pl text_in
the half ensure of needs has Sudan. Food Agriculture to funding less in help says Organization it South than U.N. food parts security and The

$ cat words.pl
use strict;
use warnings;

my %hWord;

while(<>){
chomp;
$hWord{$_}=1 for(split);
}
print join(" ",keys %hWord),"\n";