MeteoLab. Meteorological Machine Learning Toolbox for Matlab

Written by

Antonio S. Cofiño, Rafael Cano, Carmen Sordo, Cristina Primo, and José Manuel Gutiérrez
AIMet Research Group (Artificial Intelligence in Meteorology).
Cantabria University & Spanish National Weather Service (INM)
http://grupos.unican.es/ai/meteo


This toolbox is a software companion for the book:

Redes Probabilísticas y Neuronales en las Ciencas Atmosféricas, J.M. Gutiérrez, R. Cano, A.S. Cofiño y C. Sordo. Monografías del Instituto Nacional de Meteorología (Ministerio de Medio Ambiente, 2004).

Preprint pdf copy available.

Major Features:

(Last updated on 25 July 2004)


How to use the toolbox:

Before using MeteoLab it is necessary to run the init script in the MLToolbox/MeteoLab directory (the working directoy). For instance, if you unzip the toolbox in the root directory C:/, you have to proceed as follows:

>> cd C:/MLToolbox/MeteoLab
>> init

The above script configures the working directories and includes the necessary paths for the current session (if you save the current Matlab path: "File->SetPath; File->SavePath", you won't need to run the init script in future sessions).

The toolbox contains four different directories:

1. Loading Observations

MeteoLab allows loading observations data from predefined networks of stations. The following example illustrates how to load precipitation data from the GSN (Global Station Network) in Europe (this network is included in the toolbox for illustrataion purposes):

Example.Network={'GSN'}; 
Example.Stations={'Europe.stn'};
Example.Variable={'Precip'}; 
period={'1-Jan-1999','31-Dec-1999'};
[data,Example]=loadStations(Example,'dates',period,'ascfile',1);

data is a matrix containing the observations for each of the stations, and Example is a structure which provides information above the network, period, and stations. The variable Info of he Example structure provides further information for each of the loaded stations:

Example.Info

ans =
	  Id: {29x1 cell}
     Name: [29x28 char]
     Height: [29x1 double]
     Location: [29x2 double]
     Length: [29x1 double]
     StepDate: {1x29 cell}
     MissingPercent: [29x1 double]
     StartDate: [29x11 char]
     EndDate: [29x11 char]

For instance, the names of the stations:

Example.Info.Name
ans =
	  RENNES 
 	  STRASBOURG-ENTZHEIM
     ...

Several functiona are included for drawing the stations and data:

the upper panel of the observations visualization window displays the spatial pattern for a given day. The bottom panel shows the available time series for a given station (the one marked with a cross in the upper panel). This figure is interactive, so the current station can be changed by right-clicking on the circle corresponding to the desired station in the upper panel. Similarly, the day can be selected by left-clicking on a day in the bottom panel (the current date is shown by a red line).
The user can easily work with its own data by creating a new network in the ObservationsData directory and including the data in a simple plain text format, as explained in ObservationsData/MyStations/readme.txt. The network MyStations has been included to allow users to include data in a simple form, and to show the structure used in MeteoLab to define observations data.

More examples can be found in the file Examples.m in the directory 'MLToolbox/MeteoLab/Observations'

2. Loading NWP Reanalysis Gridded Outputs

We can also work with reanalysis data from numerical circulation models, defined on certained gridded zones of interest. For illustrative purposes, MeteoLab includes ERA40 reanalysis information from some predefined zones of interest (NAO and Spain). Each of these zones is defined by a text file (domain.cfg) contained in the directory PatternsData. This file also contains information about the period, variables and levels of interest. In the example below the only variable of interest is MSLP (parameter 151; see ECMWF's parameter list) in surface (level 0):

dmn=readDomain('Nao');
drawAreaPattern(dmn)
dmn
dmn =	  
	  lon: [1x31 double]
     lat: [1x19 double]
     lvl: 0
     tim: 12
     par: 151
     nod: [1x589 double]
     src: [1x67 char]
     startDate: '01-Sep-1957'
     endDate: '31-Aug-2002'
     path: 'C:/Users/MeteoLab/MeteoLab/../PatternsData/NAO/'


The directories corresponding to the zones may also contain compressed information of the reanalysis atmospheric fields specified in the domain. In this case, the patterns can be extracted and displayed efficiently:

field=getFieldfromEOF(dmn,'ncp',50,'var',151,'time',12,'level',0,...
   'startdate','01-Aug-1992','enddate','09-Aug-1992');
drawGrid(field.dat(1:end,:),dmn)



Principal Components (PCs) are used to compress the data, so the number of PCs used to reconstruct the fields can be specified (to remove noise, etc.). If no indication is given, the fields are reconstructed using all the available PCs (in the examples included in MeteoLab, only 10% of the PCs are retained).
The user can easily define its own areas of interest (region, grid resolution, and variables) and generate the PCs from the GRIB files of the reanalysis data (which have to be included in the directory NWPData).

MeteoLab is also able to load information in GRIB format. To this aim, the domain.ctl file contains a parameter called src, which specifies the directory containing the GRIB files necessary to define the information of the zone. Two different commands are given in the toolbox to load reanalysis and forecast data respectively:

Loading reanalysis GRIB data (getFieldfromGRIB):

	dmn=readDomain('Iberia');
   ctl.fil='era40.ctl';
   ctl.cam=dmn.src;
   dates={'01-Dec-1957','31-Dec-1957'};
   [patterns,fcDate]=getFieldfromGRIB(ctl,dmn,'dates',dates);
   % Drawing geopotential (129) at 12 UTC (12) at 1000 mb
   data=patterns(:,findVarPosition(129,12,1000,dmn));
   drawGrid(data(1,:),dmn)

Loading forecast data (getFRCfromGRIB). Total precipitation over Spain for the year 1999:

% Reading forecasts fields in GRIB format. getFRCfromGRIB
% units: mm (x1000 to convert from original units)
dmn=readDomain('IberiaPrecip');
ctl.fil='IberiaPrecip.ctl';
ctl.cam=dmn.src;
date=datevec(datenum('01-Jan-1999'):datenum('31-Dec-1999'));
[patterns,fcDate]=getFRCfromGRIB(ctl,dmn,date,00,00);
%Adding large scale and convective precip and drawing the fields
precip=1000*(patterns(:,findVarPosition(142,30,0,dmn))+...
	patterns(:,findVarPosition(143,30,0,dmn)));
precip=sum(precip,1); %Accumulated precipitation
drawGrid(precip,dmn);

There is also a general function readmessage to read messages in GRIB files.
The read_GRIB matlab toolbox is also included so users familiar with this framework can easily load their data into MeteoLab.

3. PC-EOF Analysis

The following example illustrates the use of MeteoLab to compute, store and load PCs and EOFs for a given area using the domain.cfg which defines grid area, variables, etc.

% Computing EOFs from ERA40 grib data (this process is time consumming)
dmn=readDomain('Nao');
ctl.cam=dmn.src;
ctl.fil='era40.ctl';
[fields,dmn]=getFieldfromGRIB(ctl,dmn);
[EOF,CP,MN,DV,OP]=computeEOF(fields,'ncp',50);
% Loading stored EOFs (previously computed)
[EOF,CP,MN,DV]=getEOF(dmn,'ncp',50,...
   'startDate','01-Jan-1958','endDate','31-Dec-2001');
% Drawing EOFs (columns of matrix EOF)
drawGrid(EOF(:,1:4)',dmn,'iscontourf',1);




drawGrid(EOF(:,[25 50])',dmn,'iscontourf',1);

% Drawing CPs (columns of matrix EOF)
figure
subplot(2,1,1); plot(CP(:,1)); %Coefficients of the first EOF
subplot(2,1,2); plot(CP(:,50)); %Coefficients of the 50th EOF

4. Statistics: Weather Generators, Clustering, etc.

The following example illustrates the commands to work with weather generators and clustering algorithms.

Example1.Network={'GSN'};
Example1.Stations={'Spain.stn'};
Example1.Variable={'Precip'};
date={'1-Jan-1998','31-Dec-1998'};
[data,Example1]=loadStations(Example1,'dates',date,'ascfile',1);
data=data(:,1); % Selecting the first stataion: San Sebastian.


umbral=[0,0.5,inf];
% Precipitation amount (continuous variable)
[s2,r2]=weatherGen(data,umbral,2,10*size(data,1),'wg');
plot(s2(1:365,1)) % Symbolic (discrete)


figure
subplot(2,1,1); plot(data(1:end,1)) % Observed
subplot(2,1,2); plot(r2(1:size(data,1),1)) % Simulated




Clustering is performed using the makeClustering command:

%Loading NAO CPs for atmospheric patterns clustering
dmn=readDomain('Nao');
[EOF,CP]=getEOF(dmn,'ncp',50);
Clustering=makeClustering(CP,'Kmeans',[100]);
figure
plot(CP(:,1),CP(:,2),'b.')
hold on
plot(Clustering.Centers(:,1),Clustering.Centers(:,2),'ko',...
  'MarkerSize',6,'MarkerEdgeColor',[0 0 0],'MarkerFaceColor',[1 0 0])



5. Regression and Time Series Forecast

The following example ...Loading hourly data from MyStations (available in the CD) Temperature in Oviedo (Spain)

Example.Network={'MyStations'};
Example.Stations={'hourlyData.stn'}; 
Example.Variable={'Common'};
[data,Example]=loadStations(Example,'ascfile',1);
drawObservations(data,Example.Info.Location,Example.Info.Name);
% Fitting the autoregressive model using ARfit package from:
% http://www.gps.caltech.edu/~tapio/arfit/
[w, A]=arfit(data,2,7); 
% The order is automatically selected from 2 to 7.
dat=data(1:10:end,1); %Resampling 
k=size(A,2); % Order of the model
lag=[];
for i=1:k+1
   lag=[lag; dat(i:end-(k+1)+i,1)'];
end
X=lag(1:k,:)'; % Lagged input variables
Y=lag(k+1,:)'; % Predicted values
pred=w+X*flipud(A');
figure; plot(pred,Y,'.k')




figure;plot([pred Y])
error=mean(abs(Y-pred))


The error as a function of sample time (from hour to day)

maxSample=20; %Maximum sample time 20 hours
error=[];
for sample=1:maxSample
   dat=data(1:sample:end,1); 
   [w, A]=arfit(dat,2,7); 
   k=size(A,2);
   lag=[];
   for i=1:k+1
   	lag=[lag; dat(i:end-(k+1)+i,1)'];
	end
   
	X=lag(1:k,:)'; 
	Y=lag(k+1,:)'; 
	pred=w+X*flipud(A');
	error=[error; mean(abs(Y-pred))]; 
end
figure
plot([1:maxSample],error,'r')



sdffg

6. Nerual Networks Forecast

Work in progress

7. Probabilistic Bayesian Networks Forecast

Work in progress

8. Validation of Probabilsitic Forecasts

Work in progress