For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > September 2004 > counting number of occurrences of every possible substring in multiple files









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author counting number of occurrences of every possible substring in multiple files
C3

2004-09-29, 8:05 pm

I am trying to write a program that reads multiple files and prints out the
number of occurrences of n-length byte sequences across these files. the
value of n must be specified on the command-line.

Since I'll be dealing with binary files, I want the ASCII codes of the
characters printed out.

e.g. for n=2 and the following 3 files, contents shown as integers,

f1 = {33, 84, 55}, f2 = {84, 55, 12}, f3 = {33, 84, 55}

I want output like this:
3 84 55
2 33 84

I'll be dealing with files up to about one megabyte in size. Efficiency is
not critical, and it does not matter, say, if a length-2 sequence is a
substring of a length-3, or a more frequently occurring sequence. Values of
n will not go above 10.



Tad McClellan

2004-09-29, 8:05 pm

C3 <> wrote:
> I am trying to write a program that reads multiple files and prints out the
> number of occurrences of n-length byte sequences across these files. the
> value of n must be specified on the command-line.
>
> Since I'll be dealing with binary files,



perldoc -f binmode


> I want the ASCII codes of the
> characters printed out.



Huh?

If it is a text file, then it contains ASCII codes.

If it is a binary file, then it may contain some other encoding.

Anyway,

perldoc -f chr
perldoc -f ord


> e.g. for n=2 and the following 3 files, contents shown as integers,
>
> f1 = {33, 84, 55}, f2 = {84, 55, 12}, f3 = {33, 84, 55}
>
> I want output like this:
> 3 84 55
> 2 33 84
>
> I'll be dealing with files up to about one megabyte in size. Efficiency is
> not critical, and it does not matter, say, if a length-2 sequence is a
> substring of a length-3, or a more frequently occurring sequence. Values of
> n will not go above 10.



Did you mean to ask a question?

What is it that you need help with?

Are you asking for someone to write a program to your specification
for you? It kind of sounds that way...


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
Paul Lalli

2004-09-29, 8:05 pm

"C3" <_> wrote in message
news:415ac218$0$20582$afc38c87@news.optusnet.com.au...
> I am trying to write a program that reads multiple files and prints

out the
> number of occurrences of n-length byte sequences across these files.

the
> value of n must be specified on the command-line.
>
> Since I'll be dealing with binary files, I want the ASCII codes of the
> characters printed out.
>
> e.g. for n=2 and the following 3 files, contents shown as integers,
>
> f1 = {33, 84, 55}, f2 = {84, 55, 12}, f3 = {33, 84, 55}
>
> I want output like this:
> 3 84 55
> 2 33 84
>
> I'll be dealing with files up to about one megabyte in size.

Efficiency is
> not critical, and it does not matter, say, if a length-2 sequence is a
> substring of a length-3, or a more frequently occurring sequence.

Values of
> n will not go above 10.


Do you realize that no where in here did you ask a question? What is it
you need help with? What part are you stuck on? What have you tried so
far, and how did your attempt fail to work correctly?

Paul Lalli


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com