Home > Archive > PERL Miscellaneous > September 2004 > counting number of occurrences of every possible substring in multiple files
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
counting number of occurrences of every possible substring in multiple files
|
|
|
| I am trying to write a program that reads multiple files and prints out the
number of occurrences of n-length byte sequences across these files. the
value of n must be specified on the command-line.
Since I'll be dealing with binary files, I want the ASCII codes of the
characters printed out.
e.g. for n=2 and the following 3 files, contents shown as integers,
f1 = {33, 84, 55}, f2 = {84, 55, 12}, f3 = {33, 84, 55}
I want output like this:
3 84 55
2 33 84
I'll be dealing with files up to about one megabyte in size. Efficiency is
not critical, and it does not matter, say, if a length-2 sequence is a
substring of a length-3, or a more frequently occurring sequence. Values of
n will not go above 10.
| |
| Tad McClellan 2004-09-29, 8:05 pm |
| C3 <> wrote:
> I am trying to write a program that reads multiple files and prints out the
> number of occurrences of n-length byte sequences across these files. the
> value of n must be specified on the command-line.
>
> Since I'll be dealing with binary files,
perldoc -f binmode
> I want the ASCII codes of the
> characters printed out.
Huh?
If it is a text file, then it contains ASCII codes.
If it is a binary file, then it may contain some other encoding.
Anyway,
perldoc -f chr
perldoc -f ord
> e.g. for n=2 and the following 3 files, contents shown as integers,
>
> f1 = {33, 84, 55}, f2 = {84, 55, 12}, f3 = {33, 84, 55}
>
> I want output like this:
> 3 84 55
> 2 33 84
>
> I'll be dealing with files up to about one megabyte in size. Efficiency is
> not critical, and it does not matter, say, if a length-2 sequence is a
> substring of a length-3, or a more frequently occurring sequence. Values of
> n will not go above 10.
Did you mean to ask a question?
What is it that you need help with?
Are you asking for someone to write a program to your specification
for you? It kind of sounds that way...
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
| Paul Lalli 2004-09-29, 8:05 pm |
| "C3" <_> wrote in message
news:415ac218$0$20582$afc38c87@news.optusnet.com.au...
> I am trying to write a program that reads multiple files and prints
out the
> number of occurrences of n-length byte sequences across these files.
the
> value of n must be specified on the command-line.
>
> Since I'll be dealing with binary files, I want the ASCII codes of the
> characters printed out.
>
> e.g. for n=2 and the following 3 files, contents shown as integers,
>
> f1 = {33, 84, 55}, f2 = {84, 55, 12}, f3 = {33, 84, 55}
>
> I want output like this:
> 3 84 55
> 2 33 84
>
> I'll be dealing with files up to about one megabyte in size.
Efficiency is
> not critical, and it does not matter, say, if a length-2 sequence is a
> substring of a length-3, or a more frequently occurring sequence.
Values of
> n will not go above 10.
Do you realize that no where in here did you ask a question? What is it
you need help with? What part are you stuck on? What have you tried so
far, and how did your attempt fail to work correctly?
Paul Lalli
|
|
|
|
|