|
|
SATAN Help
HOW TO USE THE FIT PACKAGE SATAN provides an elaborate FIT Package for data of ANALYZERS and PSEUDO ANALYZERS. The following kinds of data can be used as input for the SATAN fit package: ANALYZERS may contain: Digital data on an
equidistant grid. The x values may start with any value (integer, real, positive
or negative), and the step size may be any positive real number. Analog data with
constant bin size. The x range may start with any value (integer, real, positive
or negative) and the bin size may be any positive real number. PSEUDOANALYZERS may contain: Digital data with
constant or variable step size. The x values can have any value (integer, real,
positive or negative). Analog data with
constant or variable bin size. The x range may start with any value (integer, real, positive
or negative) and the bin size may be any positive real number. The value of the
bin size can differ from bin to bin. The width of the bin is defined by the
range from the lower limit of the current bin to the lower limit of the next
bin. The fit handling with data of pseudo analyzers: 1. The data of pseudo analyzers are defined in the dataset by: ==> A: X Y,LN1 D ==> C: x values y values errors ==> 1.5 7.3 0.5 ==> 3.5 6.3 0.4 ==> etc. 2. GREAD Read the data 3. GDISP Display the data points 4. F.... Define and perform fit (see below) 5. FRESULT List the fit result 6. FDISP / SAVE Draw the curve and prepare the data for GSAVE 7. GSAVE Write the fit curve to the dataset The data of pseudoanalyzers may have non-equidistant x values, integer or floating-point y values. They may be interpreted as analog or digital data (See HELP TYPE). The fit handling with data of analyzers: The fit to data in logarithmic representation can be performed: Errors can be defined Symmetrical vertical error bars defined in the dataset are taken into account.
Asymmetrical vertical error bars defined in the dataset are averaged and symmetrized
before entering into the fit. Two-dimensional data (array or analyzer) are also transferred to the SATAN fit
package when the command GREAD / FIT2D was entered and
when they are shown by GDISP. A function y=f(x) (e.g. a
polynomial) may be fitted to channels with z>0. The statistical weight due to the
number of counts is considered. If a polygon boundary is defined, only the data inside
this boundary are taken into account. (See also the command ARIDGE which offers more extended possibilities for
data of two-dimensional analyzers!) For fitting user-defined functions, a specific user-defined subroutine has to be
linked to GRAF (see below). The SATAN Fit Package This section provides a survey of the data fitting facilities. Following a general
description, the program executing the maximum-likelihood fit is shortly explained
("The Maximizer MLFIT")), some aspects of fitting and their realization are
described ("The Fitting Procedure") and, finally, a few applications are shown
("Examples"). In SATAN, a fit package is offered which permits a great variety and flexibility in data fitting. The programs are based on the maximizer 'MLFIT' /BLO71/. For a practically unlimited number of fit parameters this subroutine searches for local maxima of the logarithm of a likelihood function in parameter space by combining two methods, namely, the quadratic approximation developed from a Taylor's expansion and a gradient search algorithm. It allows for the fixing of parameters, giving limits that must not be exceeded by the parameters and performing a special error analysis derived from the true variation of the likelihood function near the maximum. In addition, any fit parameter may be correlated to another reference parameter by fixing the corresponding difference, sum, quotient or product. Two fit methods are offered which correspond to different definitions of the likelihood function: o The method of least squares which uses a likelihood function given by the
weighted sum of squares ('chi square'); The respective likelihood function may be plotted versus one or two fit parameters with a linear or logarithmic axis. The following fit functions are directly supported by commands: o a polynomial and the square root of a polynomial of any order, both optionally
multiplied by the exponential function of another polynomial, In addition to this set of standard functions the user may define his own fit function in a resident procedure, where the function value and first derivatives with respect to the fit parameters are calculated. The total fit function may be specified as the sum of up to ten standard or user-defined functions added initially or step by step after fitting successively parts of the spectrum. This provides a general handling of background fitting since any function may act as background for a superior one. Some fit utilities include the following. o Definition of spectrum windows to be exclusively considered by the fit The following chapter summarizes the principle of operation of the maximizer MLFIT. For detailed information refer to the original publication (flike,flsqu,fpois). Estimation of a parameter set "a = (a1 ... an)" of a theoretical function "f(x,a)" describing the experimental data "yi(xi)" is done by constructing and maximizing a likelihood function "L(a)", which represents the relative likelihood for the fit parameters to assume particular values. Two different definitions of likelihood functions are offered which correspond to fitting modes known as the least squares fit and the Poisson distribution fit. The logarithms "F(a) = ln{L(a)}" are given by the expressions F(a) = -1/2 SUM{ wi [f(xi,a) - yi]**2 } LEAST SQUARES where the weighting factors "wi" are determined by the experimental wi = ( Delta yi )**(-2), and F(a) = SUM{ yi ln f(xi,a) } - SUM{ f(xi,a) } POISSON DISTRIBUTION For a given parameter set the logarithm of the likelihood function is expanded into a Taylor series by considering the gradient "g" and the matrix of the second derivatives "G" neglecting higher-order terms. F(a + Delta a) = F(a) + [g]t Delta a - 1/2·[Delta a]t·G·Delta a An iteration step Delta a = K**(-1) g is calculated using a matrix K = G + eps1(2**l-1)·([g]t·G·g / [g]t·g)·I (I = unit matrix). The quantities "l" and "eps1" determine the step direction in parameter space; "l=0" corresponds to quadratic approximation, for "eps1(2**l-1) approximately 1" the step is carried out in the direction of the gradient. Usually starting with the value "l0=0" the parameter "l" is modified according to the result of the last step, which is classified by comparing the true variation of the likelihood function "Delta Ft", the estimated variation "Delta Fe" in case of quadratic behavior, and a third value Delta Fc = MAX{ eps2, eps3 |F(a + Delta a)| } into three categories: o Divergent: Final convergence is assumed on the following conditions: o "Delta Fe" is less than "Delta Fc", A step carried out into a forbidden region is simply treated as divergent; a
condition variable is set to one of the following reason codes: Generally the number of terms processed in an iteration equals the number of data points; a smaller value indicates the running number of the last fit point before occurrence of an error condition. Symmetric errors "sigma" and correlation coefficients "rho" for the fit parameters are derived from the inverted matrix of second derivatives of the likelihood function (error matrix). They may be used for considering error propagation; the error of a quantity "f" depending on two variables "x" and "y" is defined according to the formula sigmaf**2 = sigmax**2 · (df/dx)**2 + sigmay**2 · (df/dy)**2 + 2 · rhoxy · sigmax · sigmay · (df/dx) · (df/dy) where the expressions sigmax**2, sigmay**2 VARIANCES rhoxy, sigmax, sigmay COVARIANCES are represented by the diagonal and non-diagonal elements of the error matrix, respectively. More precise and generally asymmetric errors are obtained by variation of the likelihood function in the near of the maximum. This error analysis is time consuming, because additional iterations are performed. Fixing of a fit parameter is indicated by setting the corresponding partial
derivative to zero. If two parameters are strongly correlated, some matrix elements may
become very small during inversion. To avoid absurd results due to rounding errors, the
matrix is only partly inverted if certain matrix elements decrease by more than a factor
of "tau". The parameter corresponding to this matrix element is not modified
during the current iteration, i.e. the parameter is treated as fixed. The MLFIT parameters which may be modified interactively if required at any
critical fitting condition are listed below together with their default values: eps1 0.1 gradient search parameter eps2 0.1 convergence criterion parameter eps3 1.E-5 convergence criterion parameter eps4 0.5 convergence criterion parameter tau 1.E-5 matrix element limit for inversion l0 0 gradient search parameter, starting value ld 2 gradient serach parameter, increment ie 4 maximum number of iterations in error analysis. A frequent change of the gradient search parameter "l" between zero and
larger values during the iterations indicates a strong non-quadratic behavior of the
likelihood function; in such a case it may be useful to start with "l>0", or
to increase "ld" and decrease "eps1" simultaneously. If the accuracy
of the likelihood function is limited by large rounding errors, it is recommended to
increase eps2 and eps3; the latter parameter is essential and may be increased by orders
of magnitude. In the following, some aspects of data fitting and their realization are described. Initialization The command FINIT resets all fit variables and modes to defined starting values. Fit Windows Generally, all data shown after a display command are considered by a fit. By FWIN up to 30 parts of the spectrum may be specified either by cursor or terminal prompting loop, or via display windows or analyzer conditions defining the actual fit regions. Fit windows may extend beyond the displayed spectrum segment; irrespectively of this fact only data within the display region are fitted. Display window names and analyzer conditions are merely used as synonyms for channel limits; a fit window defined by this means is not affected by any display command. Note that unless explicitly demanded within the commands FIT or SET, bins with zero contents are not included by the fit. Temporary windows (valid only for a single command) may be given within the commands FIT (execution of fit), FLIST (evaluation of the fit function) and FSUM (sum and moments of raw and fit data). Calibration The x-values assigned to each bin may be redefined by modifying the parameter FXCAL of an analyzer with the command AMODIFY or by applying an arithmetic operation to the x values of a pseudo analyzer. Besides the major application to get an implicit transformation of fit results, The calibration may be used to avoid an overflow in numerical computations by reducing the x-values by orders of magnitude or to define the bin center according to the mean value of the contained events. For analog data, a channel "xi" is expected to include events with an observed value lying in the interval between "xi" and "xi+1"; accordingly linear calibration is effective with coefficients set to "a = 1" and "b = 0.5" (middle of the bin). For histograms (analog data), the x values are derived from the mean of the channel numbers contained by a bin. Fit Functions The various standard fit functions are defined by FPOL (polynomial or square root of a polynomial, optionally multiplied by an exponential function of a polynomial), FLEG (sum of legendre polynomials, optionally consisting of even or odd terms only and normalized), FEXP (sum of linear exponentials), FPEAKS (Gaussian or Lorentzian peaks), FBOX (convolution of rectangular box with a peak function), FPTAIL (convolution of an exponential with a peak function), and FPVOIGT (convolution of a Lorentzian peak with a Gaussian function). The command FMY acts as an interface to a user-defined fit function, which can be loaded dynamically. Background Any desired function can be considered as a background function since the command FLAST allows the total fit function to be composed of up to ten standard or user-defined components. By FLAST/KEEP the last-defined fit function is stacked (no matter if the fit was executed or not), i.e. the function specified next is taken as an additional term; successive fits adding components step by step may be executed on different windows. Each stacked function gets an internal serial number by which it is identified for deleting or displaying singly. Automatic renumbering is done after a component i is deleted by FLAST/DEL(i). Keeping the last fit as background, the parameters of the respective function(s) are not fixed automatically. Peaks Gaussian or Lorentzian peaks are specified by the command FPEAKS providing three ways: o limits and FWHM are found by the automatic peak search routine, Whenever peaks are defined by limits, initial values for the peak parameters area, center and full width at half maximum are derived from sum, mean and variance of the net data in the window referred to. If only centers are given as prompting input, the full width at half maximum may be specified, however, a default value is computed from the data. The product of width and peak height yields the initial area. The resulting function is displayed (unless the keyword NODISP is given). Lower limits of areas and widths are set to zero; limits for the peak centers are set equal to the window limits or at least half the corresponding width above or below the center, respectively. Modified peak functions, resulting from the convolution of a Gaussian with a rectangular and an exponential function, are specified by the commands FPBOX and FPTAIL, respectively. Parameter Values FPAR is the general command to assign initial values and permitted limits to fit parameters. Each parameter is addressed by a running number (corresponding to the description of the respective fit function). Alternatively, the commands FAREA, FPOS and FWIDTH are available for peak functions, where areas, centers and full widths at half maximum are specified by peak numbers. After definition of a function, fit parameters are set to arbitrary starting values (generally zero). However, parameter values of a user-defined function may be initialized within the procedure $MYFIT which is called by FMY, and guess values for parameters of peak functions (FPEAKS) are calculated from sum and moments of corresponding spectrum regions. In case of a structured likelihood function with several local maxima it might be useful to plot this function versus one or two fit parameters to find appropriate starting values for the fit. This is supported by the command FLIKE which evaluates the likelihood function of the current fit environment and stores the result in an analyzer; a reasonable step size of parameter variation avoids an excessive consumption of computing time. At any time, the fit function may be evaluated and displayed with the actual set of parameter values. Fixing Fit Parameters A fit parameter is allowed to be held constant during iterations by fixing its absolute value or the difference, sum, quotient or product with respect to any reference parameter (which for its part must not be relatively fixed). The guess value for a relatively fixed parameter may be given in relative mode, i.e. by a difference, sum, quotient or product which internally is transposed into an absolute value. Fixing and varying are done by FPAR or in case of peak parameters by FAREA, FPOS and FWIDTH. Execution of Fit The fit is executed by the command FIT using the mode of least squares or Poisson distribution, respectively. In case of least-squares fit weighting factors for the data are represented by the reciprocal squares of the experimental errors specified by FERR; if no errors are given, all data have constant weights. The fit may be very sensitive to the choice of appropriate weighting factors. Bins with zero contents are excluded from fit by default but may be demanded to be considered. Least squares weights if required are set to 1 unless a non zero error is given; however, for spectra with small data rates the Poisson fit mode is adequate. The execution of fit is stopped by reaching convergence, by a maximum number of iterations or by exceeding a CPU-time limit; it may of course be restarted with the actual fit parameter set. The fit may also be interrupted by pressing the "BREAK" button on the SATAN interface window. In case of critical fit conditions some MLFIT parameters may be altered by means of the command FMLFIT. A detailed output after each iteration comprising true and estimated values of the likelihood function, number of terms processed, "l"-parameter, condition flag in case of divergence and current fit-parameter values is given on the keyword LIST of the FIT command. Results Values, errors and correlations of the fit parameters are listed, prepared for GSAVE or stored into global parameters by FRESULT. The command FLIST produces an output of fit-function values and experimental data. Under some circumstances it may be preferable to determine sum and moments of raw data or raw data minus background fit in an interval rather than peak fitting. FSUM calculates the sum and first to third moments of a distribution "yi(xi)" due to the following expressions: s = SUM{ yi } SUM m = SUM{ yi · xi / s } MEAN sigma**2 = SUM{ yi · (xi - m)**2 / s } VARIANCE mu3 = SUM{ yi · (xi - m)**3 / s } SKEWNESS Graphs of the total fit function, selected components or single peaks are shown by
FDISP with an optional accuracy defined by a number of
polygon lines or cubic splines composing the line to be drawn. With FSTORE, fit data evaluated for each bin of the displayed
region (or the difference of experimental and fit data, optionally divided by Errors of Experimental and Fit Data Experimental errors (and hence weighting factors for least squares fit) may be specified by FERR to be undefined (corresponding to constant weighting factors), statistical (square root of observed counting rate), proportional (given by a percentual amount of the counting rate) or external (contained in a dataset). Initially, errors are undefined. In this case, no errors can be evaluated for the fit parameters. Symmetric errors and correlations of fit parameters are taken from the matrix of second derivatives; in addition, an extended error analysis generally yielding asymmetric errors may be demanded for particular parameters (flagged by FPAR, FAREA, FPOS, FWIDTH, respectively) by the keyword ERRANAL within the command FIT. Errors for fit data (FLIST, FSTORE) or net integrals of raw data (FSUM/FITERR) are derived from symmetric errors and correlation coefficients (which are essential for considering error propagation) and listed, provided that experimental errors have been defined. Problems In the following, a few critical fit conditions are described. In any case, the fit should be restarted using the keyword LIST of the command FIT to get a comprehensive output for each iteration which facilitates the solution of the problem. o An error condition occurs during iteration. The column 'TERMS' of the iteration output indicates the fit point processed last before occurence of the respective error denoted in the column 'COND' by a number: 2 fit parameter exceeds specified limit Depending on the error condition, different actions are recommended: - exclude the respective fit point by FWIN; o The fit converges very slowly ('further iteration recommended'). - Looping of the "l" parameter (column 'L' of the iteration
output) between 0 and larger values indicates a strong non-quadratic behavior of the
likelihood function. In case of error condition_2 check the fit parameter limits,
especially for peak positions (FPAR/LIST, FPOS/LIST); open parameter intervals which are set too
narrow (e.g. FPAR.../NOLIM). Otherwise retry the fit
with an increased MLFIT parameter "l0" (default 0), or increased "ld"
(default 2) and decreased "eps1" (default 0.1), set by FMLFIT. o The fit has reached convergence, but the result looks unreasonable. Fitting a small number of fit points (compared to the number of free fit
parameters), frequently parameters are fixed internally due to strong correlations (and
thus have unreasonable values), which is indicated by the number at column 'FIX' of the
iteration output or by missing fit-parameter errors. Decrease the parameter
"tau" (default 1E-5) by an order of magnitude with FMLFIT/TAU(...)
and restart fitting. Fit of Legendre polynomials to an angular distribution GDISP ... display command FLEG 2 / D E define fit function: Legendre polynomial of second order, x-values interpreted as angles in degrees, terms of even order only FIT execute fit FLEG 4 / D E add a term of fourth order FIT execute fit FRES / C list results including correlations FLIST list experimental and fit data Fit of a single Gaussian to displayed data GDISP ... display command FWIN 1 specify analyzer condition no. 1 as fit window FPEAK * / GAU define a Gaussian peak as fit function referring to the fit window FIT execute fit FRES list results FSUM / T compute sum and moments of raw data for comparison Fit of a spectrum of Gaussian peaks plus background GDISP ... display command FERR / S define experimental errors to be statistical FXCAL 0 0.01 decrease x-values in order to avoid overflow FWIN / LOOP define a set of windows for background fit by cursor loop FPOL 8 define background function: polynomial of eighth order FIT execute fit FPAR / FIX fix all background parameters FLAST / KEEP keep last defined function as background FWIN / ALL the whole displayed spectrum is considered by the fit FPEAK 10 search for the ten most significant peaks FIT execute fit FRESULT 2 list results of peaks only FDISP / F(1) draw background singly FDISP / PEAK draw peaks singly Fit of two correlated peaks with relatively fixed parameters GDISP ... display command FPEAK 2 / PROMPT enter two peak regions by cursor prompting loop FPOS 1 / D(2) G(-14.6) R fix the difference of both peak positions; assign a guess value to peak position "1" less than that of peak "2" by "-14.6" FWIDTH 1 / Q(2) G(1) R equate both peak widths and fix the ratio FIT / LIST execute fit giving a comprehensive output of all iterations, results and correlations FDISP / PEAK display peaks singly Fit 8 peaks, all widths are required to be equal GDISP ... FPEAK / PROMPT initialize peak positions and perform fit with FIT independant width parameters FWIDTH 1..7 / FIX REL QUOT(8) GUESS(1) fix relative width values to 8th peak FIT execute fit with relatively fixed width values Fit with the user-defined function GDISP ... display command FMY POLY define the user's fit function 'POLY' with the actual number of parameters prompted FPAR 1 / F G(0) set parameter 1 to zero and fix it FLAST / K keep user's function as background FPE / L add a Lorentzian function FIT execute fit FRESULT list results FDISP / F(1) draw the user-defined function singly Example for user-defined fit functions $MYFIT: PROC(C_NAME,I_NPAR,D_X,D_FPAR,I_SWIT,D_Y,D_DERIV,I_DERIV,I_RC); ; DECLARE C_NAME CHAR(*) VAR, /* FUNCTION NAME */ I_NPAR BIN FIXED(15), /* NO. OF PARAMETERS */ D_X DEC FLOAT(15), /* X VALUE */ D_FPAR(*) DEC FLOAT(15), /* FITPARAMETERS */ I_SWIT BIN FIXED(15), /* 1: INITIALIZE PARAMETERS */ /* 2: RETURN Y & DERIVATIVES */ D_Y DEC FLOAT(15), /* Y VALUE */ D_DERIV(*) DEC FLOAT(15), /* PARTIAL DERIVATIVES */ I_DERIV BIN FIXED(31), /* DERIVATIVES REQUESTED */ I_RC BIN FIXED(15); /* RETURN CODE */ /* */ DECLARE I BIN FIXED(15); /* LOOP VARIABLE */ ; DECLARE GPRTCL ENTRY(CHAR(*) VAR, BIN FIXED(15)), /* WRITE TO SCREEN AND PROTOCOL */ GPVRL ENTRY(CHAR(*) VAR, (*) DEC FLOAT(15)); /* PROMPTING ROUTINE */ ; SELECT (C_NAME); WHEN ('EXCOS') DO; /* DAMPED OSCILLATION */ IF I_SWIT=1 THEN CALL GPVRL('ENTER FIT PARAMETERS',D_FPAR); IF I_SWIT=2 THEN DO; IF D_FPAR(3)=0 THEN DO; I_RC = 3; /* FIT FUNCTION NOT CALCULABLE*/ RETURN; END; D_Y = D_FPAR(1) * EXP(-D_FPAR(2)*D_X) * COS(D_X/D_FPAR(3)); IF I_DERIV = 1 THEN DO; /* DERIVATIVES REQUESTED */ D_DERIV(1) = EXP(-D_FPAR(2)*D_X) * COS(D_X/D_FPAR(3)); D_DERIV(2) = D_Y * -D_X; D_DERIV(3) = D_FPAR(1) * EXP(-D_FPAR(2)*D_X) * -SIN(D_X/D_FPAR(3)) * D_X / D_FPAR(3)**2; END; ELSE D_DERIV = 0; END; END; WHEN ('POLY') DO; /* POLYNOMIAL OF ORDER I_NPAR-1*/ IF I_SWIT=1 THEN D_FPAR=0.; IF I_SWIT=2 THEN DO; D_DERIV(1) = 1.0E0; D_Y = D_FPAR(1); IF I_NPAR > 1 THEN DO I=2 TO I_NPAR; D_DERIV(I) = D_DERIV(I-1) * D_X; D_Y = D_Y + D_FPAR(I) * D_DERIV(I); END; END; END; OTHERWISE CALL GPRTCL('FUNCTION '||C_NAME||' NOT FOUND',1); END; ; I_RC = 0; RETURN; END $MYFIT; Link the GRAF package together with your $MYFIT function by
Independently of the above described fit
package, a
simple polynomial fit is available by the line symbol "F" in a GRAF
dataset. This fit is always performed to the representation of the data, like the fit
package does, if the option |