For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > June 2007 > String Manipulation









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author String Manipulation
Dharshana Eswaran

2007-06-28, 3:59 am

Hi All,

I have a string in which structures are stored. Each time i read a file,
different structure in stored in the string. The different possible
structures stored in the string is as shown:

$string = "{
STACK_CC_SS_COMMON_TYPE_REFERENCE_ID_T pp_reference_id;
STACK_CC_SS_COMMON_TYPE_CM_LOCAL_CAUSE_T
generic_cause;
STACK_CC_SS_COMMON_TYPE_CM_LOCAL_CAUSE_T
specific_cause;
STACK_CC_SS_COMMON_TYPE_CHANNEL_INFO_T channel_info;
STACK_REG_COMMON_TYPE_RAB_RB_INFO_T rab_info;
STACK_CC_SS_COMMON_TYPE_L3_MSG_UNIT_T pp_l3_msg;
} STACK_PRIMITIVE_MNCC_MESSAGE_T;
};";

or

$string = "{
UINT8 mms; /* More messages to send */
UINT8 transport_method;
UINT8 mo_rpdu[STACK_MSG_COMMON_TYPE_TF_MAX_VAR
_MSG_LEN];
} STACK_PRIMITIVE_MNSMS_EST_REQ_T;
};";

or

$string = "{
STACK_REG_COMMON_TYPE_REG_CAUSE_T pp_reg_cause; /* Reason the
primitive was sent */
STACK_REG_COMMON_TYPE_PLMN_T pp_plmn; /* PLMN MS should
move to */
STACK_REG_COMMON_TYPE_SIM_T pp_sim_type; /* Valid only on
BUTE */
STACK_REG_COMMON_TYPE_NW_MENU_PARAMS_T pp_nw_menu_params; /* Valid
only when pp_reg_cause is
* _CAUSE_NW_MENU_CHANGE,
_CAUSE_POWER_ON */
BOOL cingular_ens_sim_phone; /* Valid when
pp_reg_cause is SIM_INSERT */
BOOL tty_enabled; /* Valid only on BUTE. This is
valid when the reg_cause
* is SIM_INSERT, POWERON
and BANDSWITCH.
* TRUE : restrict RAT to
GSM
* FALSE: Don't restrict RAT
to GSM
*/
} STACK_PRIMITIVE_MNMM_REG_REQ;
};",

From the above structures, i need to extract the data type and the variable
name of each structure seperately for further processing.
For Eg: Data Type is STACK_REG_COMMON_TYPE_REG_CAUSE_T
Variable Name is pp_reg_cause

I am unable to get a generalised way in which it can extract them as few
structures have comments, few does not hav comments etc.

I need to extract them and store that in a different variable for further
proccessing.

I kindly request all to guide me in this.

Thanks and Regards,
Dharshana

Tom Phoenix

2007-06-28, 3:59 am

On 6/27/07, Dharshana Eswaran <dharshana.ve@gmail.com> wrote:

> I am unable to get a generalised way in which it can extract them as few
> structures have comments, few does not hav comments etc.


Does the data have some defined grammar, or a definable one at least?
If you are up to using Parse::RecDescent, it will probably do the job.

http://search.cpan.org/author/DCONW.../RecDescent.pod

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training
Chas Owens

2007-06-28, 3:59 am

On 6/27/07, Tom Phoenix <tom@stonehenge.com> wrote:
snip
> Does the data have some defined grammar, or a definable one at least?
> If you are up to using Parse::RecDescent, it will probably do the job.

snip

Many people are afraid to use Parse::RecDescent because of the
learning curve involved. I find that odd given that these people
already use regexes, but perhaps an example will spur people to use
it. This is a simple parser for the strings provided. Given the
structure of the strings I have no doubt that the grammar is
incomplete (for instance, I only allow one dimensional arrays), but it
can probably be extended from here as new examples present themselves.

#!/usr/bin/perl

use strict;
use warnings;
use Parse::RecDescent;

my @string = (
"{
STACK_CC_SS_COMMON_TYPE_REFERENCE_ID_T pp_reference_id;
STACK_CC_SS_COMMON_TYPE_CM_LOCAL_CAUSE_T
generic_cause;
STACK_CC_SS_COMMON_TYPE_CM_LOCAL_CAUSE_T
specific_cause;
STACK_CC_SS_COMMON_TYPE_CHANNEL_INFO_T channel_info;
STACK_REG_COMMON_TYPE_RAB_RB_INFO_T rab_info;
STACK_CC_SS_COMMON_TYPE_L3_MSG_UNIT_T pp_l3_msg;
} STACK_PRIMITIVE_MNCC_MESSAGE_T;
};",
"{
UINT8 mms; /* More messages to send */
UINT8 transport_method;
UINT8 mo_rpdu[STACK_MSG_COMMON_TYPE_TF_MAX_VAR
_MSG_LEN];
} STACK_PRIMITIVE_MNSMS_EST_REQ_T;
};",
"{
STACK_REG_COMMON_TYPE_REG_CAUSE_T pp_reg_cause; /* Reason
the primitive was sent */
STACK_REG_COMMON_TYPE_PLMN_T pp_plmn; /* PLMN MS
should move to */
STACK_REG_COMMON_TYPE_SIM_T pp_sim_type; /* Valid
only on BUTE */
STACK_REG_COMMON_TYPE_NW_MENU_PARAMS_T pp_nw_menu_params; /*
Valid only when pp_reg_cause is
*
_CAUSE_NW_MENU_CHANGE, _CAUSE_POWER_ON */
BOOL cingular_ens_sim_phone; /* Valid when
pp_reg_cause is SIM_INSERT */
BOOL tty_enabled; /* Valid only on BUTE. This
is valid when the reg_cause
* is SIM_INSERT,
POWERON and BANDSWITCH.
* TRUE : restrict RAT to GSM
* FALSE: Don't
restrict RAT to GSM
*/
} STACK_PRIMITIVE_MNMM_REG_REQ;
};"
);

my $p = Parse::RecDescent->new(join '', <DATA> ) or die "parser error";

for my $s (@string) {
warn "could not parse [$s]" unless $p->text($s);
}

__DATA__
text: <skip: qr{\s* (/[*] .*? [*]/ \s*)*}sx> '{' statement(s) '}'
identifier ';' '};' {
our @vars;
print "$item[5]\n@vars";
@vars = ();
1; #make sure the rule returns true
}
statement: identifier identifier array(?) ';' {
our @vars;
my ($type, $var, $elems) = (@item[1,2], $item[3][0]);
if ($elems) {
$elems =~ s/\[(.*)\]/$1/;
$type = "array of $type with $elems elements";
}
push @vars, "\tdata type is $type and variable name is $var\n";
1; #make sure the rule returns true
}
array: /\[.*?\]/
identifier: /[A-Za-z_][A-Za-z0-9_]*/
Dharshana Eswaran

2007-06-28, 3:59 am

On 6/28/07, Tom Phoenix <tom@stonehenge.com> wrote:
>
> On 6/27/07, Dharshana Eswaran <dharshana.ve@gmail.com> wrote:
>
>
> Does the data have some defined grammar, or a definable one at least?




The defined Grammer here is
{
xyz1 abc1; /*Comments*/
xyz2 abc2;
xyz3 abc3[req];
xyz4 abc4[req]; /*Comments*/
};

Here, i have defined different possibility of occurences of the structure
elements. If i could get a regex for extracting xyz1, xyz2, xyz3, xyz4 and
abc1, abc2, abc3[req], abc4[req] would be helpful. Here, the comments are of
no use, i just need to ignore them.

>If you are up to using Parse::RecDescent, it will probably do the job.


I am restricted from using modules and i am unable to come up with a regex
or regexes to do this job.

>http://search.cpan.org/author/DCONW...RecDescent-1.94
>/lib/Parse/RecDescent.pod


>Hope this helps!


>--Tom Phoenix
>Stonehenge Perl Training


Can anyone guide me in this?

Thanks and Regards,
Dharshana

Tom Phoenix

2007-06-28, 3:59 am

On 6/27/07, Dharshana Eswaran <dharshana.ve@gmail.com> wrote:

> I am restricted from using modules and i am unable to come up
> with a regex or regexes to do this job.


So, the Pointy-Haired Boss won't let you use any module that didn't
come with Perl?

Even if you can't use Parse::RecDescent in your final program, you can
still use it to get the job done. If something resembling Chas Owens's
solution works, you can turn a grammar into a module by the means
described under "Precompiling parsers". Then you can pull the Perl
code out of that module, and the PHB never has to know you didn't
write it yourself.

http://search.cpan.org/dist/Parse-R...mpiling_parsers

Cheers!

--Tom Phoenix
Stonehenge Perl Training
Chas Owens

2007-06-28, 3:59 am

On 6/27/07, Dharshana Eswaran <dharshana.ve@gmail.com> wrote:
> On 6/28/07, Tom Phoenix <tom@stonehenge.com> wrote:
>
>
>
> The defined Grammer here is
> {
> xyz1 abc1; /*Comments*/
> xyz2 abc2;
> xyz3 abc3[req];
> xyz4 abc4[req]; /*Comments*/
> };
>
> Here, i have defined different possibility of occurences of the structure
> elements. If i could get a regex for extracting xyz1, xyz2, xyz3, xyz4 and
> abc1, abc2, abc3[req], abc4[req] would be helpful. Here, the comments are of
> no use, i just need to ignore them.
>
>
> I am restricted from using modules and i am unable to come up with a regex
> or regexes to do this job.
>
>
>
>
> Can anyone guide me in this?
>
> Thanks and Regards,
> Dharshana
>


It is fragile, but here are a set of regexes that parse the string you
mentioned. I did notice that this string differs significantly from
the ones you gave earlier and this set of regexes will not correctly
handle them.

#!/usr/bin/perl

use strict;
use warnings;

my $comment = qr{\s* (?:/\* .*? \*/ \s*)*}xs;
my $identifier = qr{ [A-Za-z_]\w* }xs;
my $statement = qr{
\s*
($identifier)
\s+
($identifier)
\s*
(?: \[ (.*?) \] )?
\s*
;
\s*
$comment?
}xs;

my $str = <<EOS;
{
xyz1 abc1; /*Comments*/
xyz2 abc2;
xyz3 abc3[req];
xyz4 abc4[req]; /*Comments*/
};
EOS

my @m = $str =~ /$statement/g;

my $iter = by_n(3, \@m);

while ((my ($type, $var, $elems) = $iter->()) == 3) {
if ($elems) {
$type = "array of $type with $elems elements";
}
print "type is $type and variable is $var\n";
}

sub by_n {
my ($n, $a) = @_;
my $i = 0;
sub {
return undef if $i > $#$a;
my @ret = @{$a}[$i .. $i + $n - 1];
$i += $n;
return @ret;
}
}
Dharshana Eswaran

2007-06-28, 3:59 am

Thank you. But i am unable to understand the working of the code which you
have written. Can you please explain it?

Thanks and Regards,
Dharshana

On 6/28/07, Chas Owens <chas.owens@gmail.com> wrote:
>
> On 6/27/07, Dharshana Eswaran <dharshana.ve@gmail.com> wrote:
> few
> structure
> and
> are of
> regex
>
> It is fragile, but here are a set of regexes that parse the string you
> mentioned. I did notice that this string differs significantly from
> the ones you gave earlier and this set of regexes will not correctly
> handle them.
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $comment = qr{\s* (?:/\* .*? \*/ \s*)*}xs;
> my $identifier = qr{ [A-Za-z_]\w* }xs;
> my $statement = qr{
> \s*
> ($identifier)
> \s+
> ($identifier)
> \s*
> (?: \[ (.*?) \] )?
> \s*
> ;
> \s*
> $comment?
> }xs;
>
> my $str = <<EOS;
> {
> xyz1 abc1; /*Comments*/
> xyz2 abc2;
> xyz3 abc3[req];
> xyz4 abc4[req]; /*Comments*/
> };
> EOS
>
> my @m = $str =~ /$statement/g;
>
> my $iter = by_n(3, \@m);
>
> while ((my ($type, $var, $elems) = $iter->()) == 3) {
> if ($elems) {
> $type = "array of $type with $elems elements";
> }
> print "type is $type and variable is $var\n";
> }
>
> sub by_n {
> my ($n, $a) = @_;
> my $i = 0;
> sub {
> return undef if $i > $#$a;
> my @ret = @{$a}[$i .. $i + $n - 1];
> $i += $n;
> return @ret;
> }
> }
>


Chas Owens

2007-06-28, 6:59 pm

On 6/28/07, Dharshana Eswaran <dharshana.ve@gmail.com> wrote:
> Thank you. But i am unable to understand the working of the code which you
> have written. Can you please explain it?
>
> Thanks and Regards,
> Dharshana


What, specifically, do you not understand?
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com