Home > Archive > Tcl > May 2007 > How to remove a single line from a flat file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How to remove a single line from a flat file
|
|
| Swaroop 2007-05-15, 4:22 am |
| Hi,
I want to remove a single line from a flat file using TCL. My file
looks like this.
123096 Kumar 3
111111 Kiran 4
323456 AAAA 4
If the user has given input as 123096, The script should remove the
entire line (with 123096). How can i do this.?
-Swaroop
| |
| Leopold Gerlinger 2007-05-15, 4:22 am |
| Swaroop wrote:
> Hi,
> I want to remove a single line from a flat file using TCL. My file
> looks like this.
>
> 123096 Kumar 3
> 111111 Kiran 4
> 323456 AAAA 4
>
> If the user has given input as 123096, The script should remove the
> entire line (with 123096). How can i do this.?
>
> -Swaroop
>
I assume that you look for the first token in the line which is
delimited by whitespace. In this case I interpret the input line as a
list, hence comparing the first list element with the pattern. I do this
for all lines in the inputfile and copy them to another outputfile.
set ifp [open {C:\InputFile.txt} r]
set ofp [open {C:\OutputFile.txt} w]
set pattern 123096
while {[gets $ifp line] >= 0} {
if {[lindex $line 0] == $pattern} {puts $ofp $line}
}
close $ifp
close $ofp
exit
Regards - Leo
| |
| Swaroop 2007-05-15, 4:22 am |
| On May 15, 10:35 am, Leopold Gerlinger <leopold.gerlin...@siemens.com>
wrote:
> Swaroop wrote:
>
>
>
>
> I assume that you look for the first token in the line which is
> delimited by whitespace. In this case I interpret the input line as a
> list, hence comparing the first list element with the pattern. I do this
> for all lines in the inputfile and copy them to another outputfile.
>
> set ifp [open {C:\InputFile.txt} r]
> set ofp [open {C:\OutputFile.txt} w]
>
> set pattern 123096
>
> while {[gets $ifp line] >= 0} {
>
> if {[lindex $line 0] == $pattern} {puts $ofp $line}
>
> }
>
> close $ifp
> close $ofp
> exit
>
> Regards - Leo
Hi,
If i am right, by doing like above, duplicate files will be
created. To avoid this do i need to move the output file to inputfile
after the script. Moreover, i guess i should use " if {[lindex $line
0] != $pattern} {puts $ofp $line} " to skip the matching line. [I have
replaced == with !=]
-Swaroop
| |
| slebetman@yahoo.com 2007-05-15, 4:22 am |
| On May 15, 1:43 pm, Swaroop <swaroop.t...@gmail.com> wrote:
> On May 15, 10:35 am, Leopold Gerlinger <leopold.gerlin...@siemens.com>
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hi,
> If i am right, by doing like above, duplicate files will be
> created. To avoid this do i need to move the output file to inputfile
> after the script.
Yes. You can do this from the tcl script itself by:
file rename $output_filename $input_filename
> Moreover, i guess i should use " if {[lindex $line
> 0] != $pattern} {puts $ofp $line} " to skip the matching line. [I have
> replaced == with !=]
In this case I guess it's safe to use lindex directly. And I admit
that I often write code that uses lindex directly on input data. But
you should be aware that lindex is sensitive to unbalanced ", { and }.
By sensitive I mean that your program will abort immediately when
lindex throws and error (unless you [catch] it of course).
If you can't control the input data format then I'd suggest:
if {[lindex [split $line] 0] != $pattern} {...
or
if {[regexp -inline {^\d+}] != $pattern} {...
| |
| Andreas Leitgeb 2007-05-15, 4:22 am |
| Swaroop <swaroop.tata@gmail.com> wrote:
> If i am right, by doing like above, duplicate files will be
> created.
If the files are sooo large that this is a concern, then the
processing is probably already so slow, that the whole task is
next to infeasible, anyway. :-)
You could also edit in-place:
either you just overwrite the portion with dummy-chars, e.g. spaces,
or you shift the whole block of data that follows.
The former is a bit easier, but you need to be extra careful with
positioning for overwrite (s , tell), and determining the number
of spaces to write (this depends on both the encoding of input file
and the length of the current matched line!)
The latter requires opening the file with "r+", and once the matching
line is found, repeated s -read-tell-s -puts-tell. The encoding
*might* be irrelevant there (but no guarantees).
| |
| suchenwi 2007-05-15, 4:22 am |
| On 15 Mai, 07:43, Swaroop <swaroop.t...@gmail.com> wrote:
> If i am right, by doing like above, duplicate files will be
> created.
Not necessarily. You could do it like this:
set fp [open $filename]
set data [read $fp]
close $fp
set fp [open $filename w]
foreach line [split $data \n] {
if {[lindex $line 0] ne $deletekey} {puts $fp $line}
}
close $fp
Then your data file exists only in one instance - but you must have
enough memory to hold all data...
| |
| Larry W. Virden 2007-05-15, 8:09 am |
| On May 14, 11:58 pm, Swaroop <swaroop.t...@gmail.com> wrote:
> I want to remove a single line from a flat file using TCL.
Okay, the first thing to realize is that flat files, at least under
linux, unix, and windows, have no special access routines. This means
that one has to read in the entire file, then write out the parts of
the file that you want out.
Given that there are no silver bullets, there are several ways you
could go at this task:
1. read the entire file into memory, then write everything out to a
new, temporary file, then rename the original, rename the temporary,
and delete the original. This technique keeps the original around
until the last moment, so that, in case of a power failure or some
other problem, you still have the original data available. You are,
however, left with a brief moment (truly less than a second, assuming
decent access to your files), where there is no file by the original
filename present. This would be a problem if the file is critical
(say, a password file, etc.)
2. Read the file a line at a time, writing out a line at a time.
Again, you have to deal with the "write to a temporary file" issues,
but if the original file is very large, then you don't take up as much
memory.
3. Open the original file read, read through to the point where you
want to delete, save the offset from the beginning, read the next line
then open the file a second time, in read/write mode, s to the
saved offset, and write out the next record, and continue reading from
the first descriptor and writing to the second. WARNING! If you
experience a power outage, program crash, user interference, network
loss, etc. you would end up with an incomplete file. However, the file
does remain in place at all times.
4. you could read in the file, write it out to a database (one record
per line), delete the record required, then read back through the
database, writing out to the original file. Again, you remove the
temporary file, but you again could experience a truncated original
file in the case of a power outage, program crash, etc.
Basically, there is no _safe_ way to do this and ensure that what you
want to do gets done completely in the case of extreme problems. I'd
go with version 1 above, typically.
| |
| suchenwi 2007-05-15, 8:09 am |
| In fact, I'd do such things not in Tcl but with utilities like gawk
(provided you have *n*x or Cygwin):
mv datafile t
gawk '$1!="123096"' t > datafile
| |
| Larry W. Virden 2007-05-15, 7:10 pm |
| On May 15, 8:26 am, suchenwi <richard.suchenwirth-
bauersa...@siemens.com> wrote:
> In fact, I'd do such things not in Tcl but with utilities like gawk
> (provided you have *n*x or Cygwin):
>
> mv datafile t
> gawk '$1!="123096"' t > datafile
Which still has the problem of leaving the system without the file for
a period of time. P.S. that can be done on windows as well - take a
look at any of the windows unix-utility suites like UWIN, Cygwin,
Microsoft's Interopt/SFU software for Windows XP and Windows Server,
MKS toolkit and quite a number of other alternatives. One of these
days, maybe I'll get around to gathering information about all of
these into a page on the wiki...
There aren't many operating systems out there which allow you to just
go into a plain text file to delete lines.
If this is not a one time affair, but something that you need to do
frequently, you might want to consider changing over to use a database
that permits trivial row deletion (which is, I'd guess, most of
them ;-)
| |
| Eric Hassold 2007-05-15, 7:10 pm |
| Hi,
Larry W. Virden wrote :
> .....
> 3. Open the original file read, read through to the point where you
> want to delete, save the offset from the beginning, read the next line
> then open the file a second time, in read/write mode, s to the
> saved offset, and write out the next record, and continue reading from
> the first descriptor and writing to the second. WARNING! If you
> experience a power outage, program crash, user interference, network
> loss, etc. you would end up with an incomplete file. However, the file
> does remain in place at all times.
However, once the copy in place is completed, file would have to be
truncated to current write offset. While there are many usual situations
where one need to truncate a file at arbitrary position (see ftruncate()
POSIX/SV function), and this operation is supported by most modern
operating system/file systems , this is unfortunatly still impossible
with current Tcl stable release (8.4), so this approach is not
applicable. TIP #208 introduces a new "chan" command, available in Tcl
8.5, and more especially "chan truncate channelId ?length?" subcommand.
<OT>
Working with very large files (say several GBytes) was probably not very
frequent in 2002 (when Tcl 8.4.0 was released). But with storage getting
less and less expensive, and with most filesystems and OS supporting
large files, it is quite natural TIP #206 (later merged into TIP #208)
was proposed some time later (proposed june 2004, accepted november 2004).
My feeling is that this is just an example, among many others, of Tcl
currently getting little by little out of sync with some developers'
needs. Among all goodies part of Tcl 8.5 (see http://wiki.tcl.tk/10630
), many offers solutions to immediate problems developers are faced
with. Having them still unavailable into stable (and most widely used)
branch, 3 to 5 years later, doesn't help providing a dynamic "brand
image" of Tcl.
I have no jugdement about 8.5 roadmap, I understand core devels are
already making impressive work on it, and I'm not advocating here for a
quick release. I'm only concerned about the opportunity to introduce new
features in Tcl more often than once every 5 years or so, so a larger
community can view Tcl as an agile language, brought by a dynamic
community, and offering practical solutions to their needs.
No doubt some features introduced in 8.5 need to wait for a major
release, because they break compatibility, need long validation, or
because they imply refactoring of Tcl core code. But others could be
very easily backported to (or even just put in) 8.4. I'm thinking of a
Tcl/TK based on 8.4 for stability, with e.g. new commands and
subcommands like chan, dict, lassign, lrepeat, string reverse, encoding
dirs, binary with new formats, maybe Xft support, etc...
Maybe this possibility has been already discussed among TCT members?
</OT>
Eric
-----
Eric Hassold
Evolane - http://www.evolane.com/
| |
| slebetman@yahoo.com 2007-05-15, 7:10 pm |
| On May 15, 10:25 pm, "Larry W. Virden" <lvir...@gmail.com> wrote:
> On May 15, 8:26 am, suchenwi <richard.suchenwirth-
>
> bauersa...@siemens.com> wrote:
>
>
> Which still has the problem of leaving the system without the file for
> a period of time. P.S. that can be done on windows as well - take a
> look at any of the windows unix-utility suites like UWIN, Cygwin,
> Microsoft's Interopt/SFU software for Windows XP and Windows Server,
> MKS toolkit and quite a number of other alternatives. One of these
> days, maybe I'll get around to gathering information about all of
> these into a page on the wiki...
>
> There aren't many operating systems out there which allow you to just
> go into a plain text file to delete lines.
>
> If this is not a one time affair, but something that you need to do
> frequently, you might want to consider changing over to use a database
> that permits trivial row deletion (which is, I'd guess, most of
> them ;-)
Well.. the only dangerous part is:
mv datafile t
On most modern filesystems this is fairly atomic and safe, just like a
database, since it only involves changing the file's name. On a
journaled filesystem, if this operation happens to fail then on next
powerup the file name will be restored to its original name.
So if you're worried about this then don't use a filesystem like
FAT32. Instead use NTFS or ext3 or HFS+ (and remember to turn on
journaling for HFS+).
| |
| billposer@alum.mit.edu 2007-05-17, 10:11 pm |
| On May 16, 7:57 am, Darren New <d...@san.rr.com> wrote:
> I never understood why everyone adopted the most primitive useless file
> system organization available as the "standard".
I suspect that it is because the pre-Unix systems with a zillion
different types of files with different types of indices and locks and
access restrictions, not to mention all sorts of things that Unix
caused to be treated like files but were not files on pre-Unix
systems, were, correctly, considered hopelessly baroque and
obtructive. The few useful features of the Dark Ages of file systems
got thrown out with the bathwater.
| |
| Donal K. Fellows 2007-05-17, 10:11 pm |
| billposer@alum.mit.edu wrote:
> The few useful features of the Dark Ages of file systems
> got thrown out with the bathwater.
Having used such things long ago, let me observe that there was a
tremendous lot of bathwater that we were well rid of.
Donal.
| |
| Darren New 2007-05-17, 10:11 pm |
| Donal K. Fellows wrote:
> billposer@alum.mit.edu wrote:
>
> Having used such things long ago, let me observe that there was a
> tremendous lot of bathwater that we were well rid of.
And let me note that while it's true there was a lot of junk on some
systems, there was a lot of good too.
I think we're starting to get to where the want-to needs is starting to
run into the able-to power of the machines again. Systems had mechanisms
back then for (say) s ing to meaningfully-identifiable places in a
file, adding/deleting/modifying records in the middle, and so on,
because you couldn't afford to duplicate a multi-megabyte file just
because you added something in the middle. Nowadays we're getting files
too large to reasonably fit on one disk, and we have no way of editing
them. Reinventing the wheel here.
If UNIX had just a couple more operations, like "insert some bytes" and
"delete some bytes", which would be pretty easy to add[1] by simply
keeping a "bytes used in this block" kind of counter for each block, you
could eliminate a whole class of problems.
Of course, what people are doing is writing their own file systems and
publishing them as services rather than OS APIs. I refer here to things
like Google's file system, Amazon's S3, database servers, and so on. So
maybe what we really need is portable IPC that doesn't suck, instead.
[1] Of course, all the infrastructure like locking and caching and stuff
would have to be updated, but it's conceptually simple.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
| |
| Neil Madden 2007-05-17, 10:11 pm |
| slebetman@yahoo.com wrote:
> On May 16, 7:53 pm, "Larry W. Virden" <lvir...@gmail.com> wrote:
>
> Nope, that never happens. Rename is the only "file destroying" part.
>
> mv datafile t
> # at this point datafile still exist but renamed to t
....
I think Larry's point is that if another process now tries to access
this file using the name "datafile" it won't find it. Reversing the
operations as Atte Kojo suggests solves this.
-- Neil
| |
| Donal K. Fellows 2007-05-17, 10:11 pm |
| Darren New wrote:
> If UNIX had just a couple more operations, like "insert some bytes" and
> "delete some bytes", which would be pretty easy to add[1] by simply
> keeping a "bytes used in this block" kind of counter for each block, you
> could eliminate a whole class of problems.
Those are things that it would be nice to have. But it's amazing how
much can be done without them, and they would certainly make
implementing a filesystem much more difficult...
Donal.
| |
| Darren New 2007-05-17, 10:11 pm |
| Donal K. Fellows wrote:
> Those are things that it would be nice to have. But it's amazing how
> much can be done without them, and they would certainly make
> implementing a filesystem much more difficult...
I can't imagine why they would make implementing the file system more
difficult, other than perhaps ls (), which is a silly interface to
start with for a variety of reasons.
Remember that the point of using a flat file system wasn't that it was
better for programmers, but that it was easier to implement in the
kernel and matched the semantics of mapping a file into memory space a
la Multics.
Of course, a mechanism for having UNIX-style files and more powerful
files that nevertheless could be read compatibly would be best. To have
to know which kind of file your source code is in so you can compile it,
for example, is one of the bad features that some of the old file
systems indeed had.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
| |
| Donal K. Fellows 2007-05-17, 10:11 pm |
| Darren New wrote:
> Donal K. Fellows wrote:
>
> I can't imagine why they would make implementing the file system more
> difficult
Inserting a block should be fairly easy, I admit, but inserting a byte
at a random location? That's a whole 'nother kettle of fish.
Donal.
| |
| slebetman@yahoo.com 2007-05-17, 10:11 pm |
| On May 17, 11:41 pm, Neil Madden <n...@cs.nott.ac.uk> wrote:
> slebet...@yahoo.com wrote:
>
>
>
> ...
>
> I think Larry's point is that if another process now tries to access
> this file using the name "datafile" it won't find it. Reversing the
> operations as Atte Kojo suggests solves this.
>
No, that was not what he was worried about :
On May 15, 7:50 pm, "Larry W. Virden" <lvir...@gmail.com> wrote:
> but you again could experience a truncated original
> file in the case of a power outage, program crash, etc.
>
> Basically, there is no _safe_ way to do this and ensure that what you
> want to do gets done completely in the case of extreme problems. I'd
> go with version 1 above, typically.
Larry still had the notion that there is no safe way to modify a file
in the event of a random power outage. Which isn't true anymore for
modern journaled filesystems. I was just pointing out that Richard's
solution doesn't lose data in case of failure (the fact that the data
is in a file with a different name is a different issue).
| |
| Darren New 2007-05-17, 10:11 pm |
| Donal K. Fellows wrote:
> Inserting a block should be fairly easy, I admit, but inserting a byte
> at a random location? That's a whole 'nother kettle of fish.
And if you're keeping track of the number of used bytes in each block,
that's pretty much the same operation. If there's room in the block,
insert the byte. If not, insert a new block in between and insert the
byte into it. That's why I said it just makes byte-offset s ing
harder, because there's no fixed relationship between bytes and blocks.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
| |
| slebetman@yahoo.com 2007-05-18, 8:07 am |
| On May 18, 7:34 am, Darren New <d...@san.rr.com> wrote:
> Donal K. Fellows wrote:
>
> And if you're keeping track of the number of used bytes in each block,
> that's pretty much the same operation.
On most implementations of filesystems, it's not the same. Blocks are
essentially a link-list like structure, sometimes implemented as a
table of link pointers (like FAT32) sometimes an actual link list.
Inserting a random block in the middle of a sequence of blocks is very
fast. Simply point the current table entry to the location of the
inserted block and point the inserted block's table entry to the next
block in the list.
Bytes on the other hand don't have a table (or tables) to keep track
of their order. Assuming a block of less than 65k and a table per
block keeping track of bytes the same way we keep track of blocks
would require two bytes of overhead per byte of data. This is
obviously a huge waste of space, basically 60% of your disk is
unusable.
> If there's room in the block,
> insert the byte. If not, insert a new block in between and insert the
> byte into it. That's why I said it just makes byte-offset s ing
> harder, because there's no fixed relationship between bytes and blocks.
Ah this is much more sensible. Instead of treating bytes like blocks
have an automatic function at the OS level to do insert-and-move for
you. Kind of like a memmove() function for disk I/O. However, such an
algorithm can easily be implemented by the user. And unix has a long
tradition of letting such things be solved by the user (see:
http://www.jwz.org/doc/worse-is-better.html for example). Doing it at
the OS level of course has the advantage of being able to prevent
simultaneous edits from happening.
| |
| Atte Kojo 2007-05-18, 8:07 am |
| On 17 touko, 18:30, Darren New <d...@san.rr.com> wrote:
> I think we're starting to get to where the want-to needs is starting to
> run into the able-to power of the machines again. Systems had mechanisms
> back then for (say) s ing to meaningfully-identifiable places in a
> file, adding/deleting/modifying records in the middle, and so on,
> because you couldn't afford to duplicate a multi-megabyte file just
> because you added something in the middle. Nowadays we're getting files
> too large to reasonably fit on one disk, and we have no way of editing
> them. Reinventing the wheel here.
>
> If UNIX had just a couple more operations, like "insert some bytes" and
> "delete some bytes", which would be pretty easy to add[1] by simply
> keeping a "bytes used in this block" kind of counter for each block, you
> could eliminate a whole class of problems.
I think that UNIX filesystem are simple (primitive if you want)
because UNIX itself is simple (primitive). It's just like keeping all
configuration information in ASCII files (or using ex ;-).
The system provides a very simple filesystem and it's up to the
programmer to implement a more advanced (complicated) mechanism on top
of it in the rare cases it's needed. Take syslog for example; file
access is serialized by using a global daemon that that is the only
process writing to the log files. Really simple and elegant if you ask
me :). If you want something more complicated than that, then you
should be using a databas which has its own methods for locking,
caching and stuff.
It would take a lot of legwork to convince me to use a filesystem with
almost database-like functionality just because a few programs might
need the features ;-)
| |
| Neil Madden 2007-05-18, 8:07 am |
| In-Reply-To: <1179436976.671606.224790@k79g2000hse.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 39
Message-ID: <3Rf3i.296$D26.102@newsfe6-gui.ntli.net>
Date: Fri, 18 May 2007 11:10:55 GMT
NNTP-Posting-Host: 82.10.217.9
X-Trace: newsfe6-gui.ntli.net 1179486655 82.10.217.9 (Fri, 18 May 2007 12:10:55 BST)
NNTP-Posting-Date: Fri, 18 May 2007 12:10:55 BST
Organization: NTL
Xref: number1.nntp.dca.giganews.com comp.lang.tcl:280387
slebetman@yahoo.com wrote:
> On May 17, 11:41 pm, Neil Madden <n...@cs.nott.ac.uk> wrote:
>
> No, that was not what he was worried about :
I think you are contradicted by the quoted history of this thread (above):
"Which still has the problem of leaving the system without the file for
a period of time."
and by a previous message, where Larry says:
"Think password file - with this
approach, until the gawk finishes, there would be no password file on
the system... not a good state to have your machine."
-- Neil
| |
| Larry W. Virden 2007-05-18, 8:07 am |
| On May 18, 7:10 am, Neil Madden <n...@cs.nott.ac.uk> wrote:
> I think you are contradicted by the quoted history of this thread (above):
Now, now, no need to argue over what I thought or meant. I'm sitting
right here - feel free to ask me, in private or public, what I meant.
I'll try, yet again, to explain.
There are several states that critical files (like passwords or other
types of access control or resource listing files) can be in.
1. Fully existent. Think a quiet period when no changes are occurring
to the list of users and passwords.
2. Not present - in some of the previous discussions, this would be
the case if one did the move of password to some temporary name.
3. Incomplete - again, this would be the case from the earlier
examples, where one moved the file and then rewrote it.
4. Inconsistent - this state applies to the last example, where one
creates a temporary file then does a move of the temporary file to the
authority file. So, how can there be inconsistent state here? Let's
play "imagine this"...
Application 1 opens a file and starts reading through, looking for
information.
Application 2 creates a temporary file, consisting of new,
replacement, or remaining records after a deletion.
Application 1 continues reading
Application 2 completes the creation of the replacement file and does
the move.
Application 1 is reading a file that doesn't exist any longer,
essentially. I don't believe that it is going to see the new file - it
didn't open it. So it is only going to see what was in the original
file, which must be in cache or something.
This is where it would really be useful to have fully functional file
locking, so that at the time of opening a file, one opens it with
locking that says "someone is reading this file" or "someone is
writing this file". If someone is writing the file at the time someone
wants to read it, then likely one would wait. If someone is reading
the file at the time someone else is reading it, no problem, let it
happen. If someone is reading the file when someone wants to move the
temporary file into place, then I guess one would wait, or, perhaps,
some sort of "override lock" mechanism might be put into place that
would signal the reader "hey, something has changed - you've lost your
lock, you need to start over somehow ".
J Average Developer is going to look at this thread and say "boy, that
old guy is certainly paranoid". And I say "yup, young whippersnapper.
After programming for developers for 30 years ... and in particular
doing maintenance fixes for most of that time on the same code base,
you'd find you became paranoid as well."
| |
| Donal K. Fellows 2007-05-18, 7:09 pm |
| Larry W. Virden wrote:
> This is where it would really be useful to have fully functional file
> locking, so that at the time of opening a file, one opens it with
> locking that says "someone is reading this file" or "someone is
> writing this file".
Traditional Unix filesystems (i.e. anything local with inodes) do this
for you. What happens is that when you open the file, you increment the
(internal) reference count on the inode so that when the file is
deleted, it doesn't *actually* get deleted until the last process with
it open closes that file handle. This is *very* nice indeed, and is a
major factor behind the way that Unix systems don't need to be rebooted
very often, even when carrying out fairly significant surgery on
applications and libraries. (IIRC, by convention sending SIGUSR1 to
services gets them to drop open handles and reopen everything as well as
rereading their config files.)
Windows instead goes for an approach using locking, with the side effect
that systems are far more likely to need a reboot after a library
update. (There's even a special call to arrange for a file to be deleted
on next reboot, precisely to work around the over-zealous locking...)
Donal.
| |
| Darren New 2007-05-18, 7:09 pm |
| Atte Kojo wrote:
> access is serialized by using a global daemon that that is the only
> process writing to the log files. Really simple and elegant if you ask
> me
Agreed. I'd be happier if Linux had non-suckful IPC too. :-)
But the fact that if you want to (say) edit the syslog file you have to
turn off syslog (so it's not writing to the file while you edit it),
that tells me you're missing something in the kernel. That you have to
actually have the web server (for example) start using a new log file so
you can rotate the old log records to a different partition says there's
something wrong there, to me. That you need lock files so sendmail
doesn't clobber something while you're reading your mail with a MUA says
there's something missing there.
When you actually say "what are all the work-arounds I use to account
for the crummy file system", you begin to realize the work-arounds are
so common you don't even notice them any more.
> It would take a lot of legwork to convince me to use a filesystem with
> almost database-like functionality just because a few programs might
> need the features ;-)
Have you ever used one?
I'm not being snide here. I'm just pointing out that I bet a bunch of
people who never used a heirarchical file system would say the same
thing about directories.
Once you get used to being able to s in files based on
contextually-relevant information, and being able to edit a large file
without having to rescrub the whole thing every time, you realize how
much you're missing.
I.e., the reason you only have a few programs that take advantage of
such functionality is that such functionality is so difficult to take
advantage of. And there are a ton of programs that could certainly use
such functionality, but instead just rewrite the entire file.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
| |
| Darren New 2007-05-18, 7:09 pm |
| Larry W. Virden wrote:
> If someone is reading
> the file at the time someone else is reading it, no problem,
No. If nobody is writing the file, or waiting to write the file, no
problem. Otherwise, you get what Linux does (did?), which is
write-starvation.
> J Average Developer is going to look at this thread and say "boy, that
> old guy is certainly paranoid".
Nah. Just worried about reliability. It's amazing how many corner cases
just plain aren't handled right in lots of code.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
| |
| Darren New 2007-05-18, 7:09 pm |
| Donal K. Fellows wrote:
> Windows instead goes for an approach using locking, with the side effect
> that systems are far more likely to need a reboot after a library
> update. (There's even a special call to arrange for a file to be deleted
> on next reboot, precisely to work around the over-zealous locking...)
The only files that you can't open in a way that prevents deleting them
while they're running is executables. If you have a data file, a log
file, a config file, etc, just open it with delete-while-open
permissions turned on, and you can delete it while it's open, with the
same semantics UNIX uses.
It's just not the default in most implementations of stdio, apparently,
for some reason.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
| |
|
| On Fri, 18 May 2007 10:57:00 -0700, Darren New <dnew@san.rr.com>
wrote:
>Agreed. I'd be happier if Linux had non-suckful IPC too. :-)
Whatever that means. Want to bet that if you provide a high-level
definition of what you want then loads of people will argue about it?
>But the fact that if you want to (say) edit the syslog file you have to
>turn off syslog (so it's not writing to the file while you edit it),
>that tells me you're missing something in the kernel.
If you need to edit it, it's not a log file. The mechanism is fine for
what it needs to do. If you're talking about keeping summary instead
of detail, see below.
> That you have to
>actually have the web server (for example) start using a new log file so
>you can rotate the old log records to a different partition says there's
>something wrong there, to me.
So you want to allow infinite log files? Anything that writes a log
should have a way of splitting "old" from "current" - then you can do
what you like with the "old" - summarize, or even edit it!
>That you need lock files so sendmail
>doesn't clobber something while you're reading your mail with a MUA says
>there's something missing there.
Missing? If two processes need to change the same thing you need a
lock (type unspecified). Don't mistake an imperfect implementation for
evidence either way - whatever mechanisms are provided can be misused.
>When you actually say "what are all the work-arounds I use to account
>for the crummy file system", you begin to realize the work-arounds are
>so common you don't even notice them any more.
If you think "work-around" you will do the wrong thing. Unix/Linux is
a low-level platform. If you need something that is not there you have
to find it or create it. Most people don't look very far, and whether
they have looked properly or not those who create something don't
usually make much effort to make it separable so that it could be
re-used.
| |
| alexswilliams 2007-05-20, 5:51 am |
| As wonderful as this diversion about OS IO mechanisms is (very interesting for a comp-sci novice to read)...
quote: Originally posted by Larry W. Virden
On May 14, 11:58 pm, Swaroop <swaroop.t...@gmail.com> wrote:
Basically, there is no _safe_ way to do this and ensure that what you
want to do gets done completely in the case of extreme problems.
If there's no way of doing it, change the problem :).
I've always been told that deleting a line was best done by having a character at the beginning of the line that read, say, "O" for ok, and "D" for deleted; and the program would skip over any line beginning with a D. Furthermore, you could add in any other status codes you fancied, like B for blank record (in case one needs adding).
And, I know, once in a while you'd come up against the housekeeping routine that copies records into a new file, maybe sorts them, and so on. But unless it was a particularly active file, that's what "we're going down for maintenance" could be about. Traditionally, you set the read-only flag, and start the housekeeping in a temporary file. At the end of the routine, you make the old filename point to the new file, (which is a fairly atomic process), and clear the RO flag.
Now to implement that in tcl.
Alex | |
|
| On May 15, 5:58 am, Swaroop <swaroop.t...@gmail.com> wrote:
> Hi,
> I want to remove a single line from a flat file using TCL. My file
> looks like this.
>
> 123096 Kumar 3
> 111111 Kiran 4
> 323456 AAAA 4
>
> If the user has given input as 123096, The script should remove the
> entire line (with 123096). How can i do this.?
>
> -Swaroop
When I expect to use this kind of requirement more than a once, I try
to make a convenient API, e.g., in this case, treat the file as a
list:
# the api
proc with_file_as_list {fname listVar body} {
upvar $listVar lines
set fid [open $fname r]
set lines [split [read $fid] \n]
close $fid
set copy $lines
uplevel 1 $body
if {$lines ne $copy} {
set fid [open $fname w]
puts $fid [join $lines \n]
close $fid
}
}
# use some list manipulation package
package require struct::list
with_file_as_list lines.txt lines {
set lines [struct::list filter $lines {apply {line {expr {[lindex
$line 0] != 123096}}}}]
}
| |
| Larry W. Virden 2007-05-21, 8:09 am |
| On May 18, 1:52 pm, "Donal K. Fellows"
<donal.k.fell...@manchester.ac.uk> wrote:
> Larry W. Virden wrote:
>
> Traditional Unix filesystems (i.e. anything local with inodes) do this
> for you.
Alas, the environment I use most typically doesn't use local disk
much. Instead, NFS is king, with farms of file servers and, these
days, specialized network storage devices. And I still don't trust NFS
locking. I still remember the days when apps would crash after having
created an NFS lock, leaving the app in a state where one had to
reboot to fix things. And of course, in a server environment, you
don't want to have to reboot the server to fix such things...
| |
| Larry W. Virden 2007-05-21, 8:09 am |
| On May 21, 6:35 am, iu2 <isra...@elbit.co.il> wrote:
> # the api
> proc with_file_as_list {fname listVar body} {
> upvar $listVar lines
> set fid [open $fname r]
> set lines [split [read $fid] \n]
> close $fid
> set copy $lines
>
> uplevel 1 $body
>
> if {$lines ne $copy} {
> set fid [open $fname w]
> puts $fid [join $lines \n]
> close $fid
> }
>
> }
>
> # use some list manipulation package
> package require struct::list
>
> with_file_as_list lines.txt lines {
> set lines [struct::list filter $lines {apply {line {expr {[lindex
> $line 0] != 123096}}}}]
>
> }
And don't forget to write out that file afterwards. And of course,
hopefully this is a single machine/single user/single program type
system, because otherwise, you need to add locking, at the very least.
| |
| Donal K. Fellows 2007-05-21, 7:10 pm |
| Larry W. Virden wrote:
> Alas, the environment I use most typically doesn't use local disk
> much. Instead, NFS is king, with farms of file servers and, these
> days, specialized network storage devices.
While that's OK for lightly-loaded stuff, it's terrible for anything
that needs real performance and resilience. But doing better is
non-trivial. (We use a SAN for some things, and local disks with careful
backup strategies for others. User filestore is usually on AFS, but I
keep mine local so that I can continue to work without any network at
all; a feature that's frequently very useful.)
> And I still don't trust NFS locking.
Wise man.
Donal.
| |
|
| On May 21, 3:52 pm, "Larry W. Virden" <lvir...@gmail.com> wrote:
> On May 21, 6:35 am, iu2 <isra...@elbit.co.il> wrote:
>
>
>
>
>
>
>
>
>
>
> And don't forget to write out that file afterwards. And of course,
> hopefully this is a single machine/single user/single program type
> system, because otherwise, you need to add locking, at the very least.
All true, but the file *is* written at the end of the block. That's
what's nice about this proc. You treat the file as a list, and if any
change occures to the list, the file is updated.
This kind of "with" procs is very powerful and can I use it quit a
bit. I think this method originated in Lisp, and it now penetrates
other dynamic languages, such as Ruby and Python.
|
|
|
|
|