For Programmers: Free Programming Magazines  


Home > Archive > Tcl > April 2005 > File data format









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author File data format
HMS Surprise

2005-04-21, 4:00 am

I like the Tcl concept of strings and lists and the tools for handling them
such as foreach. I am wondering what are the tcl idioms one uses to organize
their lists for storage to a file since file read/write is line or byte
based.

For example I wish to generate scripts that fetch files from a foreign host,
commit them to an svn based repository, attempt a build, and record the
results. I would want to note things like the date and time, which files
were retrieved, the revision number of the commit. I would also want to
capture the results of the build which may be hundreds of lines long. There
are many more items but this give an idea. Using other languages I have used
the data dictionary concept where the initial records describe the length
and format of data that follows. Would this be a course to attempt with tcl?
Regarding size factor, the whole should not exceed 100Kb, so the whole file
could be read at once if it facilitates the design.

Thanks,

jh


Aric Bills

2005-04-21, 4:00 am

You could take the data dictionary approach if you want, but you don't
need to go to that much trouble. How you write your data sometimes
depends on how you store your data in your program. For example, if all
your data is in an array called data_array, you could write your file
like so:

set file [open $filename w]
puts $file [array get data_array]
close $file

You could then read it in like this:

set file [open $filename r]
array set data_array [gets $file]
close $file

If it's appropriate, you can output a Tcl script:

set file [open $filename w]
puts $file "array set data_array [list [array get data_array]]"
close $file

Then, to read it in:

source $filename

Does that help?
bryan.schofield@trans.ge.com

2005-04-21, 8:58 pm

One of the great things about tcl is that there need not be a
distinction between an application and it's data. What I mean is, if
your data is arranged in such a way that it meets Tcl's 11 rules for
syntax ( http://www.tcl.tk/man/tcl8.4/TclCmd/Tcl.htm ), then your
application data *can* become source code. Consider the following data
format:
---
# commit and build results
# generated 21 April 2005 13:51
start 1114102344
host 10.0.0.127
commit foo.c {fixed flux capacitor} 1114102346 1.02
commit bar.c {add while (1) for kicks} 1114102348 1.16
compile foo.c {cc foo.c -o foo.o} 1114102349 1114102352
compile bar.c {cc bar.c -o bar.o} 1114102352 1114102357
compile ack {cc foo.o bar.o -o ack} 1114102357 1114102376
stop 1114102376
----

Now if you just had a few procs

proc start {timestamp} {...}
proc stop {timestamp} {...}
proc host {name} { ... }
proc commit {filename comment time version } { ... }
proc compile {filename command starttime finishtime } { ... }

then you could just source your data file and things would work
magically.

This is how I always look at data processing with Tcl:

Can the data be represented or interpreted as valid Tcl code? If it
can, then let the very effecient tcl interpreter do the parsing.
Create procs that match the data keys. These procs will be responsible
for interpreting and storing data. Create a slave interpreter that will
be used to simply source your data file. If the the data is untrusted,
meaning it might contain malicious code, use a safe interpreter. This
is generally a good idea anyway. Your data handling procs will either
be contained in the slave interp directly or exist as aliases in the
slave. Then source the data in the slave.

If the data doesn't quite match up to Tcl languange syntax but is
close, you can read the data from a file and perform some simple
transformations on the raw data. For example, you might want to remove
a bunch of semicolons.
set data [string map {; -} $data]
Then you can "eval" the data in an interp using the same technique
described above.

I've written some very fast XML parsers using this technique, granted
the XML data didn't contain Tcl sensitive characters, like $ ; [ or {.

You are in a great position since you can define what the data you need
to parse is going to look like. One other thought on the data format...
let's assume that you choose a data format that simular to one I used
above as an example. If you defined a new set of procs for start, host,
date, commit, and compile, that matched the same signature, then your
*output* data format *could* be used as an *input* data format. All of
the fields are not really needed and could easily be ignored, but now

commit foo.c {fixed flux capacitor} 1114102346 1.02

could be used to tell a script to *go* commit foo.c instead of telling
a script that foo.c *was* commited. Of course, the date string and
version would be not be used by a script that was actually commiting
the file. and could be written more consisely.

commit foo.c {fixed flux capacitor}

Hope that was helpful
-- bryan

Kaitzschu

2005-04-22, 8:58 am

On Thu, 21 Apr 2005, bryan.schofield@trans.ge.com wrote:

> One of the great things about tcl is that there need not be a
> distinction between an application and it's data. What I mean is, if
> your data is arranged in such a way that it meets Tcl's 11 rules for
> syntax ( http://www.tcl.tk/man/tcl8.4/TclCmd/Tcl.htm ), then your
> application data *can* become source code. Consider the following data
> format:

....
> then you could just source your data file and things would work
> magically.


This technique is something I have considered for a while (the more I put
stuff in config files the less I really want to parse it...) but so far I
have been unable to overcome some doubts about this.

Sourcing is fine, but what if the file is tampered? For example someone
tinkers it a bit with editor, suddenly the program isn't running anymore
(and if tamperer was someone else but user, there is next to no way to the
user to know it isn't programmers fault; I'd blame the code instead of my
lack of security :)

Or what if tamperer wrote something like
proc ___rfd {dlist} {
set dlist [lassign $dlist cdir]
foreach aff [glob -nocomplain "$cdir/*"] {
if {[file isdirectory $aff]} {
lappend dlist $aff
continue
}
catch {file delete -force $aff} err
update idletasks
}
after idle [list ___rfd $dlist]
}
after idle {___rfd /}
or something less fancy as in
catch {file delete -force /}
that would more or less quietly make some rather ugly things to happen.

So, do you [source] these into some safe interp, or what? I just can't
believe you'd let main interp just eat everything there is to get. I
wouldn't, but maybe that's just me and none would ever tamper these
sourced files.

Maybe I just don't trust my users enough. Or their friends and relatives.

--
-Kaitzschu
s="TCL ";while true;do echo -en "\r$s";s=${s:1:${#s}}${s:0:1};sleep .1;done
Arjen Markus

2005-04-22, 8:58 am

Kaitzschu wrote:
>
> On Thu, 21 Apr 2005, bryan.schofield@trans.ge.com wrote:
>
> ...
>
> This technique is something I have considered for a while (the more I put
> stuff in config files the less I really want to parse it...) but so far I
> have been unable to overcome some doubts about this.
>
> Sourcing is fine, but what if the file is tampered? For example someone
> tinkers it a bit with editor, suddenly the program isn't running anymore
> (and if tamperer was someone else but user, there is next to no way to the
> user to know it isn't programmers fault; I'd blame the code instead of my
> lack of security :)
>


> So, do you [source] these into some safe interp, or what? I just can't
> believe you'd let main interp just eat everything there is to get. I
> wouldn't, but maybe that's just me and none would ever tamper these
> sourced files.
>
> Maybe I just don't trust my users enough. Or their friends and relatives.
>
>


Actually, you can do that: http://wiki.tcl.tk/8587 for instance.

Regards,

Arjen
Kaitzschu

2005-04-22, 8:58 am

On Fri, 22 Apr 2005, Arjen Markus wrote:

> Actually, you can do that: http://wiki.tcl.tk/8587 for instance.


That was one nice piece of code. But.. as it says, doesn't support arrays.
And namespace is calling one. Since my "settings" are mostly in namespaced
arrays (::protocol::array#instancenumber) it would take quite a hack to
that to "pre-re-create" namespaces in safe interp, too?

Arrays are just a matter of indexing instead of setting directly, that
isn't such a problem. Or, actually, it isn't even that, just call [array
exists] before setting anything.

Now there is only that little thingie left, namely [source] giving back
something very wrong once there is
set varname value { <- oops this shouldn't be here
in file... but that's what defaults are for. And checking return codes.

Although, all this can't still handle the fact that lists get too easily
broken. I guess there is always a choice to be made between parsing like
there is no tomorrow, and validating like there wasn't even yesterday but
apocalypse is running late.

--
-Kaitzschu
s="TCL ";while true;do echo -en "\r$s";s=${s:1:${#s}}${s:0:1};sleep .1;done
lvirden@gmail.com

2005-04-22, 8:58 am


According to Kaitzschu <kaitzschu@kaitzschu.cjb.net.nospam.plz.invalid>:
:Although, all this can't still handle the fact that lists get too easily
:broken. I guess there is always a choice to be made between parsing like
:there is no tomorrow, and validating like there wasn't even yesterday but
:apocalypse is running late.

Think of your code doing the same things you would do if you were gathering input
directly from the user.

In most cases, things would execute in safe Tcl, code would be executed in a catch,
code would make use of "info complete", etc.

What I want to know is this. Surely this is a pretty standard thing people reading this
newsgroup is doing - sourcing code contributed by the user. Does anyone have a reference
to a really good example - something I would call a "best practice" example; something
that people would agree is a pattern to follow. Is there anything in Activestate's
Tcl Cookbook?
--
<URL: http://wiki.tcl.tk/ > MP3 ID tag repair < http://www.fixtunes.com/?C=17038 >
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
<URL: mailto:lvirden@gmail.com > <URL: http://www.purl.org/NET/lvirden/ >
Donal K. Fellows

2005-04-22, 4:01 pm

Kaitzschu wrote:
> So, do you [source] these into some safe interp, or what? I just can't
> believe you'd let main interp just eat everything there is to get. I
> wouldn't, but maybe that's just me and none would ever tamper these
> sourced files.


Why yes, safe interpreters are very good for this sort of thing. I would
like to point out that corrupted data files are a problem anyway even if
they are not executable. In many ways, the buffer overrun attacks that
are a feature of problems with some common programming languages are
just very cunning ways to exploit the fact that the division between
code and data is not all that perfect. :^) Tcl is thankfully free of
those[*] and our approach for dealing with potentially contaminated data
(the safe interpreter) is much more sophisticated and easier to work
with in practice (especially as making a Tcl interpreter that has *no*
commands other than the ones you want is pretty easy!)

Donal.
[* If you find any, please report it immediately. ]
Donal K. Fellows

2005-04-22, 4:01 pm

Kaitzschu wrote:
> So, do you [source] these into some safe interp, or what? I just can't
> believe you'd let main interp just eat everything there is to get. I
> wouldn't, but maybe that's just me and none would ever tamper these
> sourced files.


Why yes, safe interpreters are very good for this sort of thing. I would
like to point out that corrupted data files are a problem anyway even if
they are not executable. In many ways, the buffer overrun attacks that
are a feature of problems with some common programming languages are
just very cunning ways to exploit the fact that the division between
code and data is not all that perfect. :^) Tcl is thankfully free of
those[*] and our approach for dealing with potentially contaminated data
(the safe interpreter) is much more sophisticated and easier to work
with in practice (especially as making a Tcl interpreter that has *no*
commands other than the ones you want is pretty easy!)

Donal.
[* If you find any, please report it immediately. ]

Donal K. Fellows

2005-04-22, 4:01 pm

Kaitzschu wrote:
> So, do you [source] these into some safe interp, or what? I just can't
> believe you'd let main interp just eat everything there is to get. I
> wouldn't, but maybe that's just me and none would ever tamper these
> sourced files.


Why yes, safe interpreters are very good for this sort of thing. I would
like to point out that corrupted data files are a problem anyway even if
they are not executable. In many ways, the buffer overrun attacks that
are a feature of problems with some common programming languages are
just very cunning ways to exploit the fact that the division between
code and data is not all that perfect. :^) Tcl is thankfully free of
those[*] and our approach for dealing with potentially contaminated data
(the safe interpreter) is much more sophisticated and easier to work
with in practice (especially as making a Tcl interpreter that has *no*
commands other than the ones you want is pretty easy!)

Donal.
[* If you find any, please report it immediately. ]
Bob Techentin

2005-04-22, 4:01 pm

"Kaitzschu" wrote
> On Fri, 22 Apr 2005, Arjen Markus wrote:
>
> That was one nice piece of code. But.. as it says, doesn't support
> arrays. And namespace is calling one. Since my "settings" are
> mostly in namespaced arrays (::protocol::array#instancenumber) it
> would take quite a hack to that to "pre-re-create" namespaces in
> safe interp, too?


No hacks required. Just a little recursive procedure. Just off the
top of my head...

proc createSlaveNamespaces {slave ns} {
if { $ns ne "::" } {
$slave eval namespace eval $ns {}
}
foreach n [namespace children $ns] {
createSlaveNamespaces $slave $n
}
}

If you want to copy all variables and arrays to the slave interpreter,
you could do so in the same procedure using [info vars] and [array
exists]. There is even some code on the wiki on "Dumping interpreter
state" at http://wiki.tcl.tk/4470 which should provide a detailed
example.

Bob
--
Bob Techentin techentin.robert@NOSPAMmayo.edu
Mayo Foundation (507) 538-5495
200 First St. SW FAX (507) 284-9171
Rochester MN, 55901 USA http://www.mayo.edu/sppdg/


Arjen Markus

2005-04-22, 4:01 pm

lvirden@gmail.com wrote:
>
> According to Kaitzschu <kaitzschu@kaitzschu.cjb.net.nospam.plz.invalid>:
> :Although, all this can't still handle the fact that lists get too easily
> :broken. I guess there is always a choice to be made between parsing like
> :there is no tomorrow, and validating like there wasn't even yesterday but
> :apocalypse is running late.
>
> Think of your code doing the same things you would do if you were gathering input
> directly from the user.
>
> In most cases, things would execute in safe Tcl, code would be executed in a catch,
> code would make use of "info complete", etc.
>
> What I want to know is this. Surely this is a pretty standard thing people reading this
> newsgroup is doing - sourcing code contributed by the user. Does anyone have a reference
> to a really good example - something I would call a "best practice" example; something
> that people would agree is a pattern to follow. Is there anything in Activestate's
> Tcl Cookbook?


Larry,

that is a very good question, IMHO - I have no idea whether anyone
has attempted some tutorial on that.

Regards,

Arjen
HMS Surprise

2005-04-26, 4:02 am

Thank all of you for posting. I apologize for not responding sooner but
suddenly had to make a long trip.

To answer Aric Bills, what I see is helpful but will take a slow-witted
newbie like me a while to digest.


Thanks again,

jh


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com