For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > May 2004 > combining data from more than one file...









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author combining data from more than one file...
Michael S. Robeson II

2004-05-17, 5:30 am

Hi all,

I am having trouble with combining data from several files, and I can't
even figure out how to get started. So, I am NOT asking for any code
(though pseudo-code is ok) as I would like to try figuring this problem
out myself. So, if anyone can give me any references or hints that
would be great.

So, here is what I am trying to do:

I have say 2 files (I'd like to do this to as many files as the user
needs):

***FILE 1***
>cat

atacta--gat--acgt-
ac-ac-ggttta-ca--

>dog

atgcgtatgc-atcgat-ac--ac-a-ac-a-cac

>mouse

acagctagc-atgca--
----acgtatgctacg--atg-
***end file 1***


***FILE 2***

>mouse

aatctgatcgc-atgca--
----acgtaaggctagg-

>cat

atacta--gat--acgt-
ac-acacagcta--ca--

>dog

atgcgtatgc-atcgat
-ac--ac-a-ac-a-cac
***end file 2***

Basically, I would like to concatenate the sequence of each
corresponding animal so that the various input files would be out put
to a file like so:

***output***
>cat

atacta--gat--acgt-ac-ac-ggttta-ca--atacta--gat--acgt-ac-acacagcta--ca--

>dog

atgcgtatgc-atcgat-ac--ac-a-ac-a-cacatgcgtatgc-atcgat-ac--ac-a-ac-a-cac

>mouse

acagctagc-atgca------acgtatgctacg--atg-aatctgatcgc-atgca------
acgtaaggctagg-
***output end***

Notice that in the two files the data are not in the same order. So, I
am trying to figure out how to have the script figure out what the
first organism is in FILE 1( say "cat" in this case) and find the
corresponding "cat" in the other input files. Then take the sequence
data (all the cat data) from FILE 2 and concatenate it to the cat
sequence data in FILE 1 to an output file. Then it should go on to the
next organism in FILE 1 and search for that next organism in the other
files (in this case FILE 2). I do not care about the order of the data,
only that the "like" data is concatenated together.

Again, I do NOT want this solved for me (unless I am totally lost).
Otherwise, I'll never learn. I would just like either hints /
suggestions / pseudo code / even links to books or sites that discuss
this particular topic. Meanwhile, I am eagerly awaiting my "PERL
Cookbook" and I'll keep searching the web.

-Thanks!
-Mike



Johan Viklund

2004-05-17, 10:30 am

On Sun, 16 May 2004 19:50:57 -0400, Michael S. Robeson II =20
<popgen23@mac.com> wrote:

> Hi all,

Hello and Welcome to the world of bioinformatics with perl!


....

I think you should take a look at bioperl since this is genome data, for =
=20
this exercise it's not what you want, but if you want to do more biology =
=20
whith perl (blast, interfacing with databases, easy format conversion, an=
d =20
so on, and so forth...). Bioperl can be found at http://www.bioperl.org/

> ***FILE 1***
> atacta--gat--acgt-
> ac-ac-ggttta-ca--


....

>
> Again, I do NOT want this solved for me (unless I am totally lost). =20
> Otherwise, I'll never learn. I would just like either hints / =20
> suggestions / pseudo code / even links to books or sites that discuss =20
> this particular topic. Meanwhile, I am eagerly awaiting my "PERL =20
> Cookbook" and I'll keep searching the web.

So this was more like a link ;)



> -Thanks!
> -Mike


/Johan Viklund

Ps.
<off-topic>
Next exercise (or really the one before) would be to calculate the GC-ske=
w.
</off-topic>
Ramprasad A Padmanabhan

2004-05-18, 4:31 pm

Quite a unique case.
If your data is no very huge I would suggest, you just first keep on
reading all data into a huge has ( key as the animal value as the data)
and then just print out the hash into files
like ( writing pseudo code is easier if written in perl :-) )

my @files = qw(file1 file2 file3);
$/="\n" . '>'; #this way you could read one record at a time
my %alldata=();
foreach $f(@files) {
open(IN,$f) || die " Couldnt open file";
while(<IN> ){
my ($animal) = /^(.*?)\n/;
$alldata{$animal} .="$_\n\n";
}
close IN;
}

###### %alldata has all the data






On Mon, 2004-05-17 at 05:20, Michael S. Robeson II wrote:
> Hi all,
>
> I am having trouble with combining data from several files, and I can't
> even figure out how to get started. So, I am NOT asking for any code
> (though pseudo-code is ok) as I would like to try figuring this problem
> out myself. So, if anyone can give me any references or hints that
> would be great.
>
> So, here is what I am trying to do:
>
> I have say 2 files (I'd like to do this to as many files as the user
> needs):
>
> ***FILE 1***
> atacta--gat--acgt-
> ac-ac-ggttta-ca--
>
> atgcgtatgc-atcgat-ac--ac-a-ac-a-cac
>
> acagctagc-atgca--
> ----acgtatgctacg--atg-
> ***end file 1***
>
>
> ***FILE 2***
>
> aatctgatcgc-atgca--
> ----acgtaaggctagg-
>
> atacta--gat--acgt-
> ac-acacagcta--ca--
>
> atgcgtatgc-atcgat
> -ac--ac-a-ac-a-cac
> ***end file 2***
>
> Basically, I would like to concatenate the sequence of each
> corresponding animal so that the various input files would be out put
> to a file like so:
>
> ***output***
> atacta--gat--acgt-ac-ac-ggttta-ca--atacta--gat--acgt-ac-acacagcta--ca--
>
> atgcgtatgc-atcgat-ac--ac-a-ac-a-cacatgcgtatgc-atcgat-ac--ac-a-ac-a-cac
>
> acagctagc-atgca------acgtatgctacg--atg-aatctgatcgc-atgca------
> acgtaaggctagg-
> ***output end***
>
> Notice that in the two files the data are not in the same order. So, I
> am trying to figure out how to have the script figure out what the
> first organism is in FILE 1( say "cat" in this case) and find the
> corresponding "cat" in the other input files. Then take the sequence
> data (all the cat data) from FILE 2 and concatenate it to the cat
> sequence data in FILE 1 to an output file. Then it should go on to the
> next organism in FILE 1 and search for that next organism in the other
> files (in this case FILE 2). I do not care about the order of the data,
> only that the "like" data is concatenated together.
>
> Again, I do NOT want this solved for me (unless I am totally lost).
> Otherwise, I'll never learn. I would just like either hints /
> suggestions / pseudo code / even links to books or sites that discuss
> this particular topic. Meanwhile, I am eagerly awaiting my "PERL
> Cookbook" and I'll keep searching the web.
>
> -Thanks!
> -Mike
>
>
>



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com