Code Comments
Programming Forum and web based access to our favorite programming groups.C3 wrote:
> I'm looking for, or willing to write, a program that will take a list of
> files as command-line arguments, and then build up a frequency table of
> n-grams (individual bytes, or strings of 2 or more bytes) for all these
> files.
>
> e.g. ngram 4 file1.txt file2.txt
>
> would return the most frequently occurring sequences of 4 bytes over the t
wo
> files.
>
> I am willing to go quick'n'dirty for this. I understand I need to build up
a
> table of all the n-grams that exist in each file. Can someone help me get
> started on this?
Well if it's quick'n'dirty that you want:
perl -lne'BEGIN{$r="."x shift}$h{$1}++while/(?=($r))/g}{print for keys%h' 4
file1.txt file2.txt
John
--
use Perl;
program
fulfillment
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.