Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: Counting most frequently-occurring n-grams in a file (or over
C3 wrote:
> I'm looking for, or willing to write, a program that will take a list of
> files as command-line arguments, and then build up a frequency table of
> n-grams (individual bytes, or strings of 2 or more bytes) for all these
> files.
>
> e.g. ngram 4 file1.txt file2.txt
>
> would return the most frequently occurring sequences of 4 bytes over the t
wo
> files.
>
> I am willing to go quick'n'dirty for this. I understand I need to build up
 a
> table of all the n-grams that exist in each file. Can someone help me get
> started on this?

Well if it's quick'n'dirty that you want:

perl -lne'BEGIN{$r="."x shift}$h{$1}++while/(?=($r))/g}{print for keys%h' 4
file1.txt file2.txt



John
--
use Perl;
program
fulfillment

Report this thread to moderator Post Follow-up to this message
Old Post
John W. Krahn
09-24-04 01:58 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PERL Miscellaneous archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:23 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.