Karla News

Multidimensional Arrays in Perl 5.0: A Boon to Bioperl (Part 1)

Kidney Cancer

Users of Bio-Perl can now take advantage of Perl 5.0. Perl 5.0 now comes with the added advantage of being able to handle multi-dimensional arrays. Systems biologists and genomicists have to handle vast files of data. Now, they can archive, query, modify and update these data with ease. Perl already comes with the advantage of processing data “on- the-fly”. Add to this the advantage of multi-dimensional array handling and one ends up with a scripting language that becomes a bioinformaticist’s best friend.

In a nutshell

One can use brackets or back slashes to “convert” data structures like arrays and hashes to scalars and references, scalar values, are used to access these data structures.

Example:

@array = [ 0,3, 6, 9, 12, 15] ;

$newarray = @array ; #( reference)

When does a biologist need multi-dimensional arrays?

When one is working with bioinformatics, the question is more like: when does one NOT need multi-dimensional arrays?
Starting from a list of all experiments with microarray data related to diferent types of disease to pathway analysis information , one needs to provide for a variety of “table of tables” , “hashes of arrays”, “arrays of arrays” (scary-cool ! !) and so on.

Sample:

I would like to look at all data from experiments related to kidney cancer before I proceed to work on it.
Here are some experiments that I have found in public databases.

PlatformA: Microarray data from men and women over fifty with kidney cancer in metros with a high incidence of carcinogenic dyes in food. ( hypothetical) Six replicates, two with control data from healthy kidneys and two from each gender; treated by drug GROW
PlatformB: Microarray data from drug companyX of patients with Kidney cancer , some showing fewer cancerous cells in the kidneys of patients treated with Drugs M, N and B (several files, 2 replicates and one control ); treated by drug MNB
PlatformC: Microarray data from mice with kidney cancer induced via carcinogens. Two different mice models used to extract the most well defined set of genes upregulated in cancer. Categories: Four control sets, one treated with a drug, one with a diet high in antioxidants prior to induction, male, female and so on; treated by carcinogen and drug GHI

See also  Kidney Cancer, Renal Cell, Stage 3

Experiment A : treated by the drug GROW kidney cancer fifty ; six files/datasets, three categories
Experiment B : treated by the drug MNB ; more than eight files/datasets, more than three categories
ExperimentC : treated by a carcinogen and the drug GHI ; ten files/datasets, more than four categories
Experiment D : treated by the drug FNB, four files/datasets, two categories.

Here is a hash created from the above information.

Experiment A, platform 1
Experiment B, platform 2
Experiment C, platform 4
Experiment D, platform 3

The “boon”/ solution:

The above example provides us with an instance of a “Hash”. Here, the drug is the key.

We know that a hash is a scalar value in Perl. We need to be able to access the values in arrays, arrays of hashes or hashes of arrays. For this we will use a
Reference”.

A Reference is a scalar value too.
The other data structure we will use is an array.
There are several ways of storing values and processing them. References can be created in two ways.


1) THE BACKSLASH METHOD FOR MULTI-DIMENSIONAL ARRAYS
We now have to store our array in a scalar.
When you put a “” in front of an array, you get a reference to it.

Our original array :

@newdrugarray.

 

$newref = @newdrugarray