| Author |
regexp truncate line
|
|
| Niv (KP) 2006-05-25, 7:06 pm |
| I have a file with many lines of input.
Some lines start with:
ML_nn_XXXX = blah blah blah,
where nn is always a number and XXX is any alpha string. There will
always be a space befrore the = sign.
The string may repeat in the file (the blah blah will be different
though)
I need to extract the XXXX from the string and highest value of nn.
I've started with:
if {[regexp {^ML_[0-9]+_[A-Z]+\s} $inline]} {
....
}
which successfully extracts the lines with the string(s).
How do I extract the XXX and the nn from the regexp please?
Also, how to ignore repetion of the start string in later lines?
file example;
ML_01_FRED = blah balh blah
ML_02_GREG = hoo hah hoo hah
ML_03_KEVIN = thats me that is
ML_01_FRED = raining yet again, soggy WSB at Silverstone?
ML_02_GREG = its really wet here
ML_03_KEVIN = wot me again
So from the above I would like to extract FRED, GREG & KEVIN to a list
and the number 03.
| |
| Glenn Jackman 2006-05-25, 7:06 pm |
| At 2006-05-25 10:39AM, Niv (KP) <kev.parsons@mbda.co.uk> wrote:
[...]
> if {[regexp {^ML_[0-9]+_[A-Z]+\s} $inline]} {
> ....
> }
[...]
> file example;
>
> ML_01_FRED = blah balh blah
> ML_02_GREG = hoo hah hoo hah
> ML_03_KEVIN = thats me that is
> ML_01_FRED = raining yet again, soggy WSB at Silverstone?
> ML_02_GREG = its really wet here
> ML_03_KEVIN = wot me again
>
> So from the above I would like to extract FRED, GREG & KEVIN to a list
> and the number 03.
>
See http://www.tcl.tk/man/tcl8.4/TclCmd/regexp.htm
You want to use capturing parentheses in your expression to grab nn and
xx, then use an array to store the most recent nn for xxx:
if {[regexp {^ML_(\d+)_([[:alpha:]]+)\s} $inline dummy nn xxx]} {
set mynumber($xxx) $nn
}
#... later, after processing the whole file
parray mynumber
--
Glenn Jackman
Ulterior Designer
| |
| Niv (Kev Parsons) 2006-05-26, 4:10 am |
| I've managed to extract the highest number OK using regexp, but not
quite sure how to extract the last bit of the name from the string. My
code snippet so far looks like this:
#--------------------------------------------------------------------------------
set hdp_file "D:/tcl_8_4_11/SCRIPTS/test.hdp"
# Open the .hdp file in read mode.
set f1 [open $hdp_file r+]
set upper_module_number 0
while {[gets $f1 inline] >= 0} {
if { [string length $inline] > 0} {
# Extract the full Module Level prefix, number & name to
full_name
if {[regexp {^ML_[0-9]+_[A-Z]+} $inline full_name]} {
set temp_number 0
if {[regexp {[0-9]+} $full_name mynum]} {
set temp_number [expr 0 + $mynum]
if {$temp_number > $upper_module_number} {
set upper_module_number $temp_number
}
}
}
}
}
puts $upper_module_number
close $f1
#--------------------------------------------------------------------------------
This extracts the ML_nn_NAME from the start of the whole line and then
extracts the nn as number from
that and then passes out the highest value of nn found. But how do I
now extract the "NAME" part from my $full_name please? nn could be
any number of digits.
| |
| Schelte Bron 2006-05-26, 4:10 am |
| Niv (Kev Parsons) wrote:
> But how do I now extract the "NAME" part from my $full_name
> please? nn could be any number of digits.
Change the line with your regexp command to:
if__{[regexp_{^ML_[0-9]+_([A-Z]+)}_$inline_full_name name_only]}_{
Schelte
--
set Reply-To [string map {nospam schelte} $header(From)]
| |
| Niv (Kev Parsons) 2006-05-26, 4:10 am |
| Sorted it myself with:
{[regexp {^(ML_)([0-9]+)_([A-Z]+)} $inline full_name prefix mod_num
mod_name]}
Didn't realise that you could group the regexp into subgroups; really
easy once you know.
Kev P.
| |
| Donal K. Fellows 2006-05-26, 8:05 am |
| Glenn Jackman wrote:
> You want to use capturing parentheses in your expression to grab nn and
> xx, then use an array to store the most recent nn for xxx:
>
> if {[regexp {^ML_(\d+)_([[:alpha:]]+)\s} $inline dummy nn xxx]} {
> set mynumber($xxx) $nn
> }
FWIW, many Tclers (myself included) use '->' for the name of the dummy
variable as that makes the code look "more beautiful", like this:
if {[regexp {ML_(\d+)_(\w+)} $inline -> num who]} {
set mynumber($who) $num
}
Note that there is *nothing* special about the -> variable name, other
than the fact that the shortcut syntax with $ doesn't like it; to read
it we would have to write [set ->]. As if we cared. :-)
Donal.
| |
| Glenn Jackman 2006-05-26, 7:06 pm |
| At 2006-05-26 09:02AM, Donal K. Fellows <donal.k.fellows@manchester.ac.uk> wrote:
> Note that there is *nothing* special about the -> variable name, other
> than the fact that the shortcut syntax with $ doesn't like it; to read
> it we would have to write [set ->]. As if we cared. :-)
>
> Donal.
Pedantically, we can use the "medium-cut" syntax ${->}
(I care so much it hurts)
--
Glenn Jackman
Ulterior Designer
|
|
|
|