複数のディレクトリ下にあるファイルすべての MD5 ハッシュをとって,一致してるファイルを表示する perl script
極めて久々.
先日,別の拠点にあるファイルサーバーのデータを持ってきてこちらのファイルサーバーにコピーしようとしたのだけども,どうも中身が同じファイルが多いようなので,ファイルの MD5 ハッシュとって同じだったら同一内容のファイルだとして消そうと思った次第.
スクリプトの内容は下記ですが,このファイルを
$ ./filecomp.pl /hoge /booboo
みたいに実行すると,/hoge も /booboo にもある(そのサブディレクトリも含む)ファイルも全部ごっちゃにして MD5 ハッシュ計算して一致したファイル名をフルパスで表示します.
例えば /hoge/ahhan.txt と /hoge/uhhun.txt, /booboo/ahhan.txt が一致してると,順番はどうなるかわかりませんが,
/hoge/ahhan.txt(タブ区切り)/hoge/uhhun.txt
(タブ二個)/booboo/ahhan.txt
みたいな感じで二個以上ダブってたら三個目以上は二個タブの後ろにどんどんリストアップされる格好.
コードは以下.
#!/usr/bin/perluse strict;
use warnings;use File::Util;
use Digest::MD5;my $file_obj = File::Util->new(max_dives => 50000);
my @total_filelist;
my %key_filename;
my %key_hashed;
my $flag_overtwo;foreach (@ARGV) {
my @temp_filelist = $file_obj->list_dir($_,'--files-only','--recurse');
push @total_filelist, @temp_filelist;
}foreach (@total_filelist) {
open(TARGET, $_) || die "Can't open file, name: $_ :$!";
binmode(TARGET);
$key_filename{$_} = Digest::MD5->new->addfile(*TARGET)->hexdigest;
close(TARGET);
}foreach (sort{$key_filename{$b} cmp $key_filename{$a}} keys %key_filename) {
if($key_hashed{$key_filename{$_}}) {
if(! defined($flag_overtwo)) {
print "$key_hashed{$key_filename{$_}} \t $_\n";
$flag_overtwo = "deja été";
}
else {
print "\t\t $_\n";
}
}else {
undef($flag_overtwo);
}
$key_hashed{$key_filename{$_}} = $_;
}
コード内で File::Util のオブジェクト作るときの max_dives ってのは最大で何個のファイル(ディレクトリも含む?)を“掘る”かっていう感じの数字でデフォルトがかなり小さくて小生の目的では全然ダメなので大きくしてます.cpan 見てデフォルトで行くなり増やすなり適宜よろしくお願いします.
最初のループでは,引数で指定されたディレクトリ以下のファイルをぶりぶり調べてフルパスで配列 total_filelist にぶち込む.次のループでは total_filelist にある全ファイルの MD5 ハッシュを計算して,ファイル名をキー,ハッシュ値を値とするハッシュ:key_filename にぶち込む,最後のループでは %key_file ハッシュを値(すなわち MD5 ハッシュ値)でソートして,MD5 ハッシュ値をキー,フルパスのファイル名を値とするハッシュ:key_hashed を作る(すなわち reverse(%hash) をいちいちやる)ループなのですが,MD5 ハッシュ値であるキーがダブルと上書きされていくので,すでにそのキーがあると標準出力に書きだすし,一回そのキーで書き出すとフラグ:flag_overtwo がたってそれ以降の同じ MD5 ハッシュ値のダブりの処理ではタブ二個と最新の値(ファイルフルパス名)が標準出力に出てくるという話.
壮大な大車輪の大再発明のような気もしますが,その場合こっそり指摘していただければ幸い.
現在はディレクトリ自体一緒だと判断するのに,このスクリプトで出てきたディレクトリを
$ cd /ahhan
$ find . -type f > ahhan
$ cd /uhhun
$ find . -type f > uhhun
$ wc -l ahhan uhhun
あるいは
$ diff ahhan uhhun
みたいな感じで確認してディレクトリごと片方を削除したりしてます.それさえもスクリプトに含めるのは一瞬考えましたが,ファイラーもどきを書いてしまうのは too much でもあり,僭越でもあり,汎用性を考えて上のままで使ってます.
経済的メモリーの使い方とかマシンパワーとかは考慮していませんので,適宜改善の上色々していただければ幸いです.
BSD マシン上では動いてますけど,active perl 上ではどうなのでしょう?
はるか前にもエントリーあげましたが,MD5 ハッシュを cpan のモジュールで云々するというのはゆーすけべーさんのエントリー“いかにして効率よく大量のおっぱい画像をダウンロードするか”のコードの最後のほうに触発されたということが大きいです(最近は改良版とか ruby 版とか出ているといううわさですが)
「FreeBSD」カテゴリの記事
- 複数のディレクトリ下にあるファイルすべての MD5 ハッシュをとって,一致してるファイルを表示する perl script(2012.06.02)
- diablo-jdk が FreeBSD 8.0 で,そのままでは動かない件(2009.12.13)
- FFmpeg ,おお FFmpeg(2009.10.11)
- VIA の ARTiGO で FreeBSD 手のひらサーバー(2009.10.09)
- Subversion の portupgrade で mod_dav が二回 load されるようになってしまう(2009.08.29)
「Computer」カテゴリの記事
The comments to this entry are closed.
Comments
Hearing this affirmation, most online marketers take this as a relief. and it can not always be possible because of your hard schedule, exercising after a meal when you may will help with your very own weight loss. So, you need to tweet continuously and write many followers.
Posted by: how to twitter and tweet | November 13, 2013 12:13 PM
This happens to be a incredibly potent way to improve your web site hits by providing specials to your Twitter followers and directing them to your site. Reviews and ratings by customers across social media are so important to your business for 2 reasons. Being currently one of the most popular social networking internet, Twitter is a powerful marketing tool.
Posted by: jtigers.com | December 04, 2013 04:47 PM
It is really not that difficult a undertaking, especially if you have read the points discussed. Place your Twitter address on letterhead, item packaging and website. This is where you have the potential to market yourself and your internet based companies.
Posted by: how to twitter a book | February 28, 2014 04:05 PM
It's amazing to go to see this website and reading the views of all friends on the topic of this article, while I am also eager of getting knowledge.
Posted by: sexvideo | May 28, 2014 02:06 AM
Link exchange is nothing else but it is only placing the other person's website link on your page at appropriate place and other person will also do same in favor of you.
Posted by: http://wallpapergrid.com/profile/mamarcell.html | July 02, 2014 06:52 PM
I know this web site presents quality based articles and other information, is there any other web site which offers these kinds of data in quality?
Posted by: Phillip | August 14, 2014 09:38 PM
Very good information. Lucky me I recently found your site by accidnt (stumbleupon). I've saved it ffor later!
Posted by: Fifa 14 monete hackerare|#KM16 fifa 14 hacker | August 22, 2014 11:19 PM
Hello are using Wordpress for your site platform? I'm new to the blog world but I'm trying to get started and set up my own. Do you need any coding knowledge to make your own blog? Any help would be greatly appreciated!
Posted by: forextradersinc.com | September 01, 2014 07:48 PM
It's difficult to find well-informed people about this topic, however, you sound like you know what you're talking about! Thanks
Posted by: Gita | September 09, 2014 12:07 PM
Hi there to every body, it's my first pay a visit of this web site; this webpage contains remarkable and truly excellent data in favor of readers.
Posted by: option binaire | September 13, 2014 03:57 AM
The tea for Pu-erh is going to help with digestion. It is advisable to do some research on the product and as far as the diet pills are concerned oxyelite are one of the best and most effective. The 'Dukes of Hazzard' star confidently showed off her new baby-post body in a sheer blue star top as she posed with her younger sister Ashlee Simpson at the event.
Posted by: green coffee sito ufficiale | September 14, 2014 03:30 AM
Right now it looks like Wordpress is the top blogging platform out there right now. (from what I've read) Is that what you're using on your blog?
Posted by: techniques gagnantes bourse | October 03, 2014 01:15 AM
What's up, always i used to check weblog posts here in the early hours in the morning, as i like to learn more and more.
Posted by: location espagne | October 29, 2014 12:25 AM
I visited multiple websites but the audio quality for audio songs existing at this web site is in fact fabulous.
Posted by: location espagne | October 29, 2014 04:44 AM
Every weekend i used to pay a visit this web site, for the reason that i want enjoyment, as this this website conations actually fastidious funny material too.
Posted by: location espagne | October 29, 2014 01:21 PM
This piece of writing will assist the internet viewers for creating new webpage or even a weblog from start to end.
Posted by: Selene | November 06, 2014 01:01 AM
An impressive share! I have just forwarded this onto a colleague who was conducting a little homework on this. And he actually bought me lunch due to the fact that I found it for him... lol. So allow me to reword this.... Thanks for the meal!! But yeah, thanks for spending the time to talk about this issue here on your internet site.
Posted by: Reto90 | November 16, 2014 09:07 AM
Hey I know this is off topic but I was wondering if you knew of any widgets I could add to my blog that automatically tweet my newest twitter updates. I've been looking for a plug-in like this for quite some time and was hoping maybe you would have some experience with something like this. Please let me know if you run into anything. I truly enjoy reading your blog and I look forward to your new updates.
Posted by: Reto 90 que es | November 18, 2014 06:26 PM
This is one of the most advanced series by Lenovo to take an extra edge over it competitors who are rising now a day in the market. If you are into social media, you can use a wireless internet connection to get in touch or update your status in social media. Think - Pad by Lenovo is a business class product created with high quality and durability.
Posted by: harga hp | February 14, 2015 04:20 PM
What's up, all the time i used to check web site posts here in the early hours in the break of day, for the reason that i enjoy to find out more and more.
Posted by: Reto 90 | February 22, 2015 04:09 PM
They update frequently with all the newest information about games and guides to help with strategy.
Posted by: clash of clans hack 2015 | March 26, 2015 04:02 AM
Hi there! Someone in my Facebook group shared this site with us so I came to give it a look. I'm definitely enjoying the information. I'm bookmarking and will be tweeting this to my followers! Terrific blog and great design.
Posted by: auto publicador | April 05, 2015 10:31 AM
Definitely consider that that you stated. Your favorite reason seemed to be on the internet the easiest thing to bear in mind of. I say to you, I definitely get irked at the same time as other folks think about worries that they just don't understand about. You managed to hit the nail upon the top and defined out the entire thing with no need side-effects , folks could take a signal. Will likely be again to get more. Thank you
Posted by: how to get pregnant | April 13, 2015 05:46 AM
Our League of Angels Fire Raiders Hack Android does not demand you to enter your username and password.
Posted by: league of angels cheats and hack no survey | July 01, 2015 03:40 PM
Hi! I simply would like to offer you a huge thumbs up for your great info you have got right here on this post. I'll be coming back to your website for more soon.
Posted by: Stainless Cookware Review | July 04, 2015 04:50 PM
Hey I know this is off topic but I was wondering if you knew of any widgets I could add to my blog that automatically tweet my newest twitter updates. I've been looking for a plug-in like this for quite some time and was hoping maybe you would have some experience with something like this. Please let me know if you run into anything. I truly enjoy reading your blog and I look forward to your new updates.
Posted by: www.pesclubmanagerhack.info | August 02, 2015 02:32 PM
My spouse and I stumbled over here different web address and thought I might as well check things out. I like what I see so i am just following you. Look forward to looking at your web page repeatedly.
Posted by: mxtrackguide.com | August 04, 2015 01:50 PM
The cannons would aladdin king of thieves youtube launch the high-powered normal attack. Don't aladdin king of thieves youtube worry, on enabling the gamer are able to play it more difficult and different smart phones and tablets for playing the phone.
Posted by: king of thieves cheats apple | August 06, 2015 09:43 AM
So, feel about it. It is strongly advisable to comply with any weight loss diet program only right after consulting the medical professional.
Posted by: cultura internet facebook | September 11, 2015 08:47 AM
One particular of the perks was a free of charge membership, which was the motivation I necessary to take control of my weight.
Posted by: Vince | September 11, 2015 12:59 PM
Soy de Malaga y voy hoy a Marbella de rebajas, asi que voy a ver tu submit de sitios para comer que pusistes para ir a alguno.
Posted by: agencia inmobiliaria oliva valencia | September 19, 2015 10:14 PM
Now I am going away to do my breakfast, when having my breakfast coming again to read further news.
Posted by: sua similac | September 27, 2015 10:04 AM
What you typed was actually very logical. However, consider this, what if you added a little information? I am not suggesting your information isn't solid, however suppose you added something that makes people desire more? I mean 複数のディレクトリ下にあるファイルすべての MD5 ハッシュをとって,一致してるファイルを表示する perl script: スコスコ blog is kinda plain. You ought to glance at Yahoo's home page and see how they create post headlines to grab viewers to click. You might try adding a video or a pic or two to grab people interested about what you've written. Just my opinion, it would make your posts a little livelier.
Posted by: trivia crack hack free | October 18, 2015 12:19 AM
I love it when individuals get together and share views. Great website, keep it up!
Posted by: alkaline bottled water | November 03, 2015 04:39 AM