Home > Archive > Matlab > April 2005 > regex question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| stroller 2005-04-15, 4:02 pm |
| hi, i'm trying to do some parsing in matlab and would like to extract
'a = 3 b = 4 c = 5 d = 6" such that i get an array of pairs of values
ie: myArray(a) = 3
myArray(b) = 4
myArray(c) = 5 etc
also, i don't know how many pairs of values will be in the original
string
can anyone help a matlab newbie with some kind of regex here?
thx
| |
| Michael Robbins 2005-04-16, 4:02 am |
| > hi, i'm trying to do some parsing in matlab and would like to
> extract
> 'a = 3 b = 4 c = 5 d = 6" such that i get an array of pairs of
> values
>
>
> ie: myArray(a) = 3
> myArray(b) = 4
> myArray(c) = 5 etc
>
> also, i don't know how many pairs of values will be in the original
> string
I don't have MATLAB in front of me, so excuse any errors. Just let
me know if it doesn't work and I'll fix it for you.
You could do this:
regexprep(yourstring, ...
['MyArray\(([^\)]+)' ...
'\)\s*=\s*(\d+)\s*'], ...
'$1 = $2');
But if that's all you need, you shouldn't use regular expression, use
SSCANF.
You could use a much more flexible expression, one that sets what's
in the parenthesis to what's after the equals sign. Let me know if
the above expression doesn't suit you.
| |
| stroller 2005-04-16, 4:02 am |
| hi, thx for getting back to me :)
>
> You could do this:
>
> regexprep(yourstring, ...
> ['MyArray\(([^\)]+)' ...
> '\)\s*=\s*(\d+)\s*'], ...
> '$1 = $2');
>
> But if that's all you need, you shouldn't use regular expression,
> use
> SSCANF.
>
but how do i put this in to a loop to collect up all the x=y
assigments? i have no idea how many to expect when i parse this
> You could use a much more flexible expression, one that sets what's
> in the parenthesis to what's after the equals sign. Let me know if
> the above expression doesn't suit you.
i would be interested in seeing a more flexible expression, pray tell
in general there are going to be alot of scenarios where i don't know
how of a particular pattern will occur in my string, so a good way to
collect all these up would be very useful
i'm trying to write in matlab what i would usually write in
flex/bison... is this just a bad idea??
thx
| |
| stroller 2005-04-16, 4:02 am |
| also, if i were doing this in perl i could do
my $str = 'a = 3 b = 4 c = 5 d = 6';
$str =~ s/[^a-z0-9]+/ /g;
my %hash = split / /, $str;
print "$_ = $hash{$_}\n" for keys %hash;
can i do anything like this in matlab?
| |
| Michael Robbins 2005-04-16, 4:02 am |
| >> regexprep(yourstring, ...
[color=darkred]
> but how do i put this in to a loop to collect up all the x=y
> assignments? i have no idea how many to expect when i parse this
If you want to make the actual assignments in the matlab workspace
then use
regexprep(yourstring, ...
['MyArray\(([^\)]+)' ...
'\)\s*=\s*(\d+)\s*'])
You will then get two sets of tokens, the variable letters and the
numbers.
You can use ASSIGNIN to assign the numbers to the letters.
I'm not sure of the exact syntax, but something like
assignin('caller',token{i}{1},str2num(to
ken{i,2})
Try not to use EVAL if you can avoid it.
what's[color=darkred]
know if[color=darkred]
[color=darkred]
> i would be interested in seeing a more flexible expression, pray
tell
> in general there are going to be alot of scenarios where i don't
> know
> how of a particular pattern will occur in my string, so a good way
to
> collect all these up would be very useful
Give me an idea of what other forms they may take.
> i'm trying to write in matlab what i would usually write in
> flex/bison... is this just a bad idea??
I'm not familiar with that language, but MATLAB has pretty decent
text manipulation capabilities now. It's a little different from
many text-oriented languages since it is primarily C-like and
designed for traditional calculations. That often becomes an
advantage when using it for text manipulation. Many tricks are
available to you if you remember that, to MATLAB, a text string is
just an array if ASCII.
You should become *very* familiar with the following functions if you
want to do parsing in MATLAB. There are many useful ones, but these
come to mind:
REPMAT
SETDIFF
DIFF
INTERSECT
+
CHAR
CELLSTR
SSCANF
| |
| Michael Robbins 2005-04-16, 4:02 am |
| In the previous post, I meant you should use REGEXP not REGEXPREP to
get the tokens.
> also, if i were doing this in perl i could do
>
> my $str = 'a = 3 b = 4 c = 5 d = 6';
> $str =~ s/[^a-z0-9]+/ /g;
> my %hash = split / /, $str;
> print "$_ = $hash{$_}\n" for keys %hash;
>
> can i do anything like this in matlab?
Yes, what I showed you should do it. You can split pretty easily but
why bother when REGEXP will give you the tokens directly?
The cell format of the output of REGEXP is a little unwieldy. I
stumble on it now and again, but I suppose it's difficult to have
such a flexible output and make it easy to anticipate and manipulate.
Functions that will help you with the cell-cellstr format include
CELL,CELLSTR,ISCELL,ISCELLSTR.
You must drill down into the structure until you find a cellstring
and then you can use {:} to extract the string.
while ~iscellstr(yourtokens) && iscell(yourtokens)
yourtokens = yourtokens{:};
....
I don't mean that literally, because the structure of your output may
be very complex.
| |
| stroller 2005-04-16, 4:02 am |
| >
>
>
this[color=darkred]
>
> If you want to make the actual assignments in the matlab workspace
> then use
>
> regexprep(yourstring, ...
> ['MyArray\(([^\)]+)' ...
> '\)\s*=\s*(\d+)\s*'])
>
> You will then get two sets of tokens, the variable letters and the
> numbers.
>
sorry, i'm a matlab newbie and i didn't quite understand this..
when i type in this command i get nothing returned... MyArray is not
set either...
i guess i'd like to see a code fragment that populates MyArray so i
can understand this a little better
the language i'm parsing could have
a = 1 b = 2 c = 3 .... and so on, there's no way of knowing how many
of these will show up in the files i get, so i need something that
will build the array for in this scenario
again, sorry for the newbie questions
| |
| Michael Robbins 2005-04-16, 8:58 am |
| > when i type in this command i get nothing returned... MyArray is
> not set either...
MATLAB is not like perl, it is a traditional language and you must
assign values. Also I made the error of typing "MyArray" in the
regex instead of "myArray." Using REGEXPI instead of REGEXP will
ignor case if you want to avoid a similar error.
I don't have MATLAB here to test the code so I'm prone to typos.
yourstring=sprintf(['myArray(a) = 3 \n' ...
'myArray(b) = 4 \nmyArray(c) = 5\n');
[startn endn extents match tokens names] = ...
regexp(yourstring, ...
'MyArray\(([^\)]+)\)\s*=\s*(\d+)\s*');
> i guess i'd like to see a code fragment that populates MyArray so i
> can understand this a little better
In this example, the variable TOKENS will contain both your variable
names and your values (in string format). The docs are here <http://www.mathworks.com/access/hel...ref/regexp.html>
Without matlab, I'm not sure exactly what form TOKENS will take, so
you will have to play with the syntax to extract the data. You may
have to type TOKENS{i}{1} to get the variable name and
STR2NUM(TOKENS{i}{2}) to get the value associated with that variable.
It may be some other similar syntax.
If that is the case,
for i=1:length(tk)
assignin('caller',tokens{i}{1},str2num(t
okens{i}{2});
end;
should assign the values. Again, my syntax for using TOKENS may be
off. Help for ASSIGNIN is available here <http://www.mathworks.com/access/hel...f/assignin.html>
> the language i'm parsing could have
> a = 1 b = 2 c = 3 .... and so on, there's no way of knowing how
> many
> of these will show up in the files i get, so i need something that
> will build the array for in this scenario
This regex should handle any number of variable-value pairs. It will
match any variable in the parentheses with the integer immediatly
following the equals sign.
| |
| Michael Robbins 2005-04-16, 9:00 pm |
| I finally got to test the code and I missed a closing parenthesis.
Here's working code:
[color=darkred]
yourstring=sprintf(['myArray(a) = 3 \n' ...
'myArray(b) = 4 \nmyArray(c) = 5\n']);
[startn endn extents match tokens names] = ...
regexp(yourstring, ...
'myArray\(([^\)]+)\)\s*=\s*(\d+)\s*');
for i=1:length(tokens)
assignin('caller',tokens{i}{1},str2num(t
okens{i}{2}));
end;
fprintf('\nINPUT\n%s\n',yourstring);
fprintf('OUTPUT\na=%d\nb=%d\nc=%d\n',[a b c].');
INPUT
myArray(a) = 3
myArray(b) = 4
myArray(c) = 5
OUTPUT
a=3
b=4
c=5[color=darkred]
| |
| Jason Breslau 2005-04-18, 4:01 pm |
| Here are a couple of regexp tips to help with this problem.
The following assumes you have a variable in your workspace:
[color=darkred]
If you want to use the tokens output, you can ask for it directly:
[color=darkred]
However, anytime you find yourself using the tokens output, you should
consider using named tokens:
[color=darkred]
This returns a structure with fields named 'lhs' and 'rhs' which
correspond to the names given in the expression. Structures in MATLAB
will serve the purpose that you are used to getting from Perl in the
form of associative arrays.
Using the names structure you can convert it to a more useful structure
like this:
[color=darkred]
myStruct =
a: '3'
b: '4'
c: '5'
d: '6'
Good luck!
-=>J
| |
| stroller 2005-04-20, 4:02 am |
| thx very much Jason and Michael. i understand regexp much better now
after your generous insights.
Jason Breslau wrote:
>
>
> Here are a couple of regexp tips to help with this problem.
>
> The following assumes you have a variable in your workspace:
>
>
> If you want to use the tokens output, you can ask for it directly:
>
>
> However, anytime you find yourself using the tokens output, you
> should
> consider using named tokens:
>
> '(?<lhs>\w+)\s*=\s*(?<rhs>\d*(\.\d+)?)','names');
>
> This returns a structure with fields named 'lhs' and 'rhs' which
> correspond to the names given in the expression. Structures in
> MATLAB
> will serve the purpose that you are used to getting from Perl in
> the
> form of associative arrays.
>
> Using the names structure you can convert it to a more useful
> structure
> like this:
>
>
> myStruct =
>
> a: '3'
> b: '4'
> c: '5'
> d: '6'
>
> Good luck!
>
> -=>J
>
| |
| stroller 2005-04-21, 4:02 am |
| sorry to keep bugging you guys on this subject but i had one more
question on this...
if i have
str = 'aa (b c d) ee ff=1.7e-006 gg=4.1 hh=1 ii=on'
where the (b c d) could be of any length eg: (b c d e f)
and the name=value pairs at the end could of any length eg:
ff=1.7e-006 gg=4.1 hh=1 ii=on jj=sf kk=i3 etc..
i tried:
inst = regexp( str,
'(?<name>\S+)\s+\(?<pins>[^)]+)+\)\s+(<masterNm>\S+)
\s+((?<lhs>\w+)\s*=\s*(?<rhs>\w+))+','names');
but this didn't work
thx again..
| |
| Jason Breslau 2005-04-21, 4:03 pm |
| Unfortunately, this can't be done in a one shot deal with regexp.
Nested tokens aren't supported, so the lhs/rhs part you should do as a
second call.
Try this out: (I also fixed a couple of typos in your pattern)
And, if you want to pretty this up a bit:
[color=darkred]
Enjoy!
-=>J
stroller wrote:[color=darkred]
>
> sorry to keep bugging you guys on this subject but i had one more
> question on this...
>
> if i have
>
> str = 'aa (b c d) ee ff=1.7e-006 gg=4.1 hh=1 ii=on'
>
> where the (b c d) could be of any length eg: (b c d e f)
>
> and the name=value pairs at the end could of any length eg:
> ff=1.7e-006 gg=4.1 hh=1 ii=on jj=sf kk=i3 etc..
>
> i tried:
>
> inst = regexp( str,
> '(?<name>\S+)\s+\(?<pins>[^)]+)+\)\s+(<masterNm>\S+)
> \s+((?<lhs>\w+)\s*=\s*(?<rhs>\w+))+','names');
>
> but this didn't work
>
> thx again..
| |
| stroller 2005-04-28, 9:00 am |
| >
> Using the names structure you can convert it to a more useful
> structure
> like this:
>
>
> myStruct =
>
> a: '3'
> b: '4'
> c: '5'
> d: '6'
>
> Good luck!
for a while i couldn't figure this out until i realised i had a typo
where i'd missed the ' in the cell2struct call
so i had myStruct = cell2struct({names.rhs}, {names.lhs})
instead of
myStruct = cell2struct({names.rhs}', {names.lhs})
so now it works, but is the significance of the ' ??
thx
|
|
|
|
|