For Programmers: Free Programming Magazines  


Home > Archive > PERL Programming > April 2006 > file processing









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author file processing
amit_h123@yahoo.co.in

2006-04-25, 9:57 pm

Hello all ,
I have large text filw with lots of spaces and newline chararters in
it, which i want to remove.
And after that i need to construct the hash tables for the unique word
which are present in the file. Its like i need the hash for only
unigrams (one word at a time), a hash for bigrams (2 words at a time)
and same as for 3 words.
I am all lost in removing and accessing the spaces in the text file but

am not bale to access the each word at a time.
Just a simple example of what i need to do is:

if my text in file is :


hello how are you all hello how are.


so my unigrams will be like:
hello 2
how 2
are 2
you 1...


bigrams will be
hello how 2
how are 2
are you 1
you all 1


trigrams
hello how are 2
how are you 1
are you all 1
.....so on


Can anyone help me with this code.
-thanks

Gunnar Hjalmarsson

2006-04-25, 9:57 pm

amit_h123@yahoo.co.in wrote (in alt.perl):
> Hello all ,
> I have large text filw ...


You posted the same job spec. in clpmisc a couple of hours ago, and I
suggested that you learn a programming language. Was that a bad idea?

Didn't you like the hints you were given by another poster either?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Joe Smith

2006-04-25, 9:57 pm

amit_h123@yahoo.co.in wrote:
> I am all lost in removing and accessing the spaces in the text file but


That's super trivial - its one of the homework assignments given on
the first day on any respectable Perl class.

@words = split; # Where $_ contains the line of text

> And after that i need to construct the hash tables for the unique word


$unigram{$_}++ foreach @words;

> unigrams (one word at a time), a hash for bigrams (2 words at a time)
> and same as for 3 words.


my($prev1,$prev2) = ('','');
while (<> ) {
@words = split;
foreach my $word (@words) {
$unigram{$word}++;
$bigram{"$prev1 $word}++;
$trigram{"$prev2 $prev1 $word}++;
$prev2 = $prev1;
$prev1 = $word;
}
}

So what's the problem? It almost sounds as if you never heard of
the split() function, or how it works when given no arguments.

-Joe
amit_h123@yahoo.co.in

2006-04-25, 9:57 pm

Hi,
thanks for the help. I knew the split but nt without the arguments.
This really helped.

amit_h123@yahoo.co.in

2006-04-25, 9:57 pm

hello ,
But i still have one problem. Its like is there a way to access the
bigram hash values on the basis of trigram since i have to calculate
the value as :

for each key in trigarm: I have to do the following thing.
trigram{ hello how are} / bigram{hello how}

how do i access these values simultaneuosly..
any suggestions

Matt Garrish

2006-04-25, 9:57 pm


<amit_h123@yahoo.co.in> wrote in message
news:1145798342.838627.116070@i39g2000cwa.googlegroups.com...
> hello ,
> But i still have one problem. Its like is there a way to access the
> bigram hash values on the basis of trigram since i have to calculate
> the value as :
>
> for each key in trigarm: I have to do the following thing.
> trigram{ hello how are} / bigram{hello how}
>
> how do i access these values simultaneuosly..
> any suggestions
>


I suggest you learn to quote some context when posting so people have some
idea what you're talking about. Usenet is not a bulletin board, even if
*you* happen to be using google groups and see it that way.

Matt


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com