Home > Archive > PERL Beginners > April 2004 > Question regarding splitting files
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Question regarding splitting files
|
|
| Remko Lodder 2004-04-22, 1:31 pm |
| Hello there,
I have a question about how i can split files.
Let's say i have some very large files. (2gb for example)
and i want to split them into 650mb files. (they are plaintekst only)
I was thinking about something in the lines of this; (just words)
set size arguments
set input file
set output file
open the file
read the content
print line of the content to $output file
check size of $output file
continue if not to big yet
else
check which files already live in the target directory (.1 .2 .3 .4 .5 etc)
rotate file to $output file.1 (or .2 .3)
and restart the printing process
Is something like this possible (or perhaps easier or something?)
Thanks in advance!!
--
Kind regards,
Remko Lodder
Elvandar.org/DSINet.org
www.mostly-harmless.nl A Dutch community for helping newcomers on the
hackerscene
| |
| tomthumbkop 2004-04-22, 2:06 pm |
| Check perldoc -f read.
I think it will slurp chunks quickly. Just slurp as mucah as you want the new smaller file to be and output it.
quote: Originally posted by Remko Lodder
Hello there,
I have a question about how i can split files.
Let's say i have some very large files. (2gb for example)
and i want to split them into 650mb files. (they are plaintekst only)
I was thinking about something in the lines of this; (just words)
set size arguments
set input file
set output file
open the file
read the content
print line of the content to $output file
check size of $output file
continue if not to big yet
else
check which files already live in the target directory (.1 .2 .3 .4 .5 etc)
rotate file to $output file.1 (or .2 .3)
and restart the printing process
Is something like this possible (or perhaps easier or something?)
Thanks in advance!!
--
Kind regards,
Remko Lodder
Elvandar.org/DSINet.org
www.mostly-harmless.nl A Dutch community for helping newcomers on the
hackerscene
| |
| Wiggins D Anconia 2004-04-22, 2:34 pm |
| > Hello there,
>
> I have a question about how i can split files.
>
> Let's say i have some very large files. (2gb for example)
> and i want to split them into 650mb files. (they are plaintekst only)
>
Depending on the size of the files you may need a Perl enabled with
"large file support".... perl -V should tell you if your Perl is setup
to use them.
> I was thinking about something in the lines of this; (just words)
>
> set size arguments
> set input file
> set output file
>
> open the file
> read the content
> print line of the content to $output file
> check size of $output file
> continue if not to big yet
> else
> check which files already live in the target directory (.1 .2 .3 .4 .5
etc)
> rotate file to $output file.1 (or .2 .3)
> and restart the printing process
>
>
>
> Is something like this possible (or perhaps easier or something?)
>
> Thanks in advance!!
>
Sounds like a well thought out plan, and is definitely doable. Add a few
punctuation marks, a call to 'stat' (or maintain an internal counter of
the amount written so far) and you are almost done ;-).
If your intention is just to reassemble them later there are other
programs that are pre-written that might be more appropriate, and/or man dd.
http://danconia.org
| |
| Wc Jones 2004-04-22, 2:34 pm |
| > set size arguments
> set input file
> set output file
>
> open the file
> read the content
> print line of the content to $output file
> check size of $output file
> continue if not to big yet
> else
> check which files already live in the target directory (.1 .2 .3 .4 .5 etc)
> rotate file to $output file.1 (or .2 .3)
> and restart the printing process
Here is the splitting portion:
#! /usr/local/bin/perl
use strict;
use warnings;
# Example data - 85_782 lines, 1_072_787 (words), 10_313_190 bytes - filename: syslog
my $split_into = 3;
my $line_cnt;
my $counter;
my $x;
open (ROFILE, "syslog") or die "cannot open syslog $!";
while(<ROFILE> ) { ++$line_cnt; }
close (ROFILE) or die "cannot close syslog $!";
open (ROFILE, "syslog") or die "cannot re-read syslog $!";
for ($x=0; $x < $split_into; ++$x) {
open (WOFILE, ">syslog.$x") or die "cannot write to syslog.$x $!";
while(<ROFILE> ) {
print WOFILE $_;
++$counter;
last if ($counter >= ($line_cnt/$split_into));
}
$counter = 0;
close (WOFILE) or die "cannot close syslog.$x $!";
}
close (ROFILE) or die "cannot close syslog $!";
print "Done ... \n\n";
__END__
Cheers!
-Sx-
--
Overheard: Isn't this all kinda sudden? Mentor: Yes. Sometimes,
you just know that it's time to say goodbye. And the moment you know
it, you must do it. Teaching students on anything less than 100%
motivation and energy is not how it should be done.
| |
| Remko Lodder 2004-04-23, 4:39 pm |
| Thanks guys!! All that helped me, i got it to work.
I used the script below, and added some things , well, a friend of mine
came up with it:
my $size=(stat('/usr/messages.sorted'))[7];
my $chunksize = 10 * 1024 *1024 ; # mb
my $split_into = $size/$chunksize;
That makes every chunk 10megabyte. So i don't have to guess it then :-)
Again: Thanks for the pointers and the help!!
Cheers
WC Jones wrote:
>
>
>
> Here is the splitting portion:
>
> #! /usr/local/bin/perl
>
> use strict;
> use warnings;
>
> # Example data - 85_782 lines, 1_072_787 (words), 10_313_190 bytes - filename: syslog
>
> my $split_into = 3;
> my $line_cnt;
> my $counter;
> my $x;
>
> open (ROFILE, "syslog") or die "cannot open syslog $!";
> while(<ROFILE> ) { ++$line_cnt; }
> close (ROFILE) or die "cannot close syslog $!";
>
> open (ROFILE, "syslog") or die "cannot re-read syslog $!";
> for ($x=0; $x < $split_into; ++$x) {
>
> open (WOFILE, ">syslog.$x") or die "cannot write to syslog.$x $!";
> while(<ROFILE> ) {
> print WOFILE $_;
> ++$counter;
> last if ($counter >= ($line_cnt/$split_into));
> }
>
> $counter = 0;
> close (WOFILE) or die "cannot close syslog.$x $!";
> }
>
> close (ROFILE) or die "cannot close syslog $!";
> print "Done ... \n\n";
> __END__
>
>
> Cheers!
> -Sx-
>
--
Kind regards,
Remko Lodder
Elvandar.org/DSINet.org
www.mostly-harmless.nl A Dutch community for helping newcomers on the
hackerscene
|
|
|
|
|