For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > April 2004 > Question regarding splitting files









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Question regarding splitting files
Remko Lodder

2004-04-22, 1:31 pm

Hello there,

I have a question about how i can split files.

Let's say i have some very large files. (2gb for example)
and i want to split them into 650mb files. (they are plaintekst only)

I was thinking about something in the lines of this; (just words)

set size arguments
set input file
set output file

open the file
read the content
print line of the content to $output file
check size of $output file
continue if not to big yet
else
check which files already live in the target directory (.1 .2 .3 .4 .5 etc)
rotate file to $output file.1 (or .2 .3)
and restart the printing process



Is something like this possible (or perhaps easier or something?)

Thanks in advance!!

--

Kind regards,

Remko Lodder
Elvandar.org/DSINet.org
www.mostly-harmless.nl A Dutch community for helping newcomers on the
hackerscene
tomthumbkop

2004-04-22, 2:06 pm

Check perldoc -f read.

I think it will slurp chunks quickly. Just slurp as mucah as you want the new smaller file to be and output it.

quote:
Originally posted by Remko Lodder
Hello there,

I have a question about how i can split files.

Let's say i have some very large files. (2gb for example)
and i want to split them into 650mb files. (they are plaintekst only)

I was thinking about something in the lines of this; (just words)

set size arguments
set input file
set output file

open the file
read the content
print line of the content to $output file
check size of $output file
continue if not to big yet
else
check which files already live in the target directory (.1 .2 .3 .4 .5 etc)
rotate file to $output file.1 (or .2 .3)
and restart the printing process



Is something like this possible (or perhaps easier or something?)

Thanks in advance!!

--

Kind regards,

Remko Lodder
Elvandar.org/DSINet.org
www.mostly-harmless.nl A Dutch community for helping newcomers on the
hackerscene

Wiggins D Anconia

2004-04-22, 2:34 pm

> Hello there,
>
> I have a question about how i can split files.
>
> Let's say i have some very large files. (2gb for example)
> and i want to split them into 650mb files. (they are plaintekst only)
>


Depending on the size of the files you may need a Perl enabled with
"large file support".... perl -V should tell you if your Perl is setup
to use them.

> I was thinking about something in the lines of this; (just words)
>
> set size arguments
> set input file
> set output file
>
> open the file
> read the content
> print line of the content to $output file
> check size of $output file
> continue if not to big yet
> else
> check which files already live in the target directory (.1 .2 .3 .4 .5

etc)
> rotate file to $output file.1 (or .2 .3)
> and restart the printing process
>
>
>
> Is something like this possible (or perhaps easier or something?)
>
> Thanks in advance!!
>


Sounds like a well thought out plan, and is definitely doable. Add a few
punctuation marks, a call to 'stat' (or maintain an internal counter of
the amount written so far) and you are almost done ;-).

If your intention is just to reassemble them later there are other
programs that are pre-written that might be more appropriate, and/or man dd.

http://danconia.org

Wc Jones

2004-04-22, 2:34 pm

> set size arguments
> set input file
> set output file
>
> open the file
> read the content
> print line of the content to $output file
> check size of $output file
> continue if not to big yet
> else
> check which files already live in the target directory (.1 .2 .3 .4 .5 etc)
> rotate file to $output file.1 (or .2 .3)
> and restart the printing process



Here is the splitting portion:

#! /usr/local/bin/perl

use strict;
use warnings;

# Example data - 85_782 lines, 1_072_787 (words), 10_313_190 bytes - filename: syslog

my $split_into = 3;
my $line_cnt;
my $counter;
my $x;

open (ROFILE, "syslog") or die "cannot open syslog $!";
while(<ROFILE> ) { ++$line_cnt; }
close (ROFILE) or die "cannot close syslog $!";

open (ROFILE, "syslog") or die "cannot re-read syslog $!";
for ($x=0; $x < $split_into; ++$x) {

open (WOFILE, ">syslog.$x") or die "cannot write to syslog.$x $!";
while(<ROFILE> ) {
print WOFILE $_;
++$counter;
last if ($counter >= ($line_cnt/$split_into));
}

$counter = 0;
close (WOFILE) or die "cannot close syslog.$x $!";
}

close (ROFILE) or die "cannot close syslog $!";
print "Done ... \n\n";
__END__


Cheers!
-Sx-

--
Overheard: Isn't this all kinda sudden? Mentor: Yes. Sometimes,
you just know that it's time to say goodbye. And the moment you know
it, you must do it. Teaching students on anything less than 100%
motivation and energy is not how it should be done.
Remko Lodder

2004-04-23, 4:39 pm

Thanks guys!! All that helped me, i got it to work.
I used the script below, and added some things , well, a friend of mine
came up with it:

my $size=(stat('/usr/messages.sorted'))[7];
my $chunksize = 10 * 1024 *1024 ; # mb
my $split_into = $size/$chunksize;

That makes every chunk 10megabyte. So i don't have to guess it then :-)

Again: Thanks for the pointers and the help!!

Cheers

WC Jones wrote:
>
>
>
> Here is the splitting portion:
>
> #! /usr/local/bin/perl
>
> use strict;
> use warnings;
>
> # Example data - 85_782 lines, 1_072_787 (words), 10_313_190 bytes - filename: syslog
>
> my $split_into = 3;
> my $line_cnt;
> my $counter;
> my $x;
>
> open (ROFILE, "syslog") or die "cannot open syslog $!";
> while(<ROFILE> ) { ++$line_cnt; }
> close (ROFILE) or die "cannot close syslog $!";
>
> open (ROFILE, "syslog") or die "cannot re-read syslog $!";
> for ($x=0; $x < $split_into; ++$x) {
>
> open (WOFILE, ">syslog.$x") or die "cannot write to syslog.$x $!";
> while(<ROFILE> ) {
> print WOFILE $_;
> ++$counter;
> last if ($counter >= ($line_cnt/$split_into));
> }
>
> $counter = 0;
> close (WOFILE) or die "cannot close syslog.$x $!";
> }
>
> close (ROFILE) or die "cannot close syslog $!";
> print "Done ... \n\n";
> __END__
>
>
> Cheers!
> -Sx-
>



--

Kind regards,

Remko Lodder
Elvandar.org/DSINet.org
www.mostly-harmless.nl A Dutch community for helping newcomers on the
hackerscene
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com