Fityk 0.7.5 - User's Manual

Marcin Wojdyr


Table of Contents

1. Introduction
What is the program for?
How to read this manual
GUI vs CLI
2. Getting started
The minimal example
Invoking fityk
Graphical interface
Plots and other windows
Mouse usage
3. Reference
General syntax
Data from experiment
Loading data
Active and inactive points
Standard deviation or weight
Data transformations
Functions and variables in data transformations
Working with many datasets
Exporting data
Sum of fitted functions
Sum - Introduction
Variables
Function types and functions
Used-defined functions (UDF)
Speed of computations
Sum, F and Z
Guessing peak location
Displaying informations
Fitting
Nonlinear optimization
Fitting related commands
Levenberg-Marquardt
Nelder-Mead downhill simplex method
Genetic Algorithms
Settings
Other commands
plot: viewing data
info: show informations
commands, dump, sleep, reset, quit
4. Using and extending
Use cases
Extensions
How to add own built-in function
A. List of functions
B. Command shortenings
C. License
D. About this manual
Bibliography

List of Equations

A.1. Gaussian
A.2. SplitGaussian
A.3. GaussianA
A.4. Lorentzian
A.5. LorentzianA
A.6. Pearson VII (Pearson7)
A.7. Split-Pearson-VII (SplitPearson7)
A.8. Pearson-VII-Area (Pearson7A)
A.9. Pseudo-Voigt (PseudoVoigt)
A.10. Pseudo-Voigt-Area (PseudoVoigtA)
A.11. Voigt
A.12. VoigtA
A.13. Exponentially Modified Gaussian (EMG)
A.14. Doniach-Sunjic (DoniachSunjic)
A.15. Polynomial5

Chapter 1. Introduction

What is the program for?

Fityk is a program for nonlinear fitting of analytical functions (especially peak-shaped) to data (usually experimental data). The shortest description: peak fitting software. There are also people using it only to display data or to remove baseline from data.

It is reported to be used in crystallography, chromatography, photoluminescence, infrared and Raman spectroscopy and other fields. Although author is ignorant about all these experimental methods but powder diffraction, he would like to make it useful for as many people as possible.

Fityk offers various nonlinear fitting methods, easy background subtracting and other dataset manipulations, easy placement of peaks and changing peak parameters, support for analyzing series of datasets, automation of common tasks with scripts, and much more. The main advantage of the program is a flexibility - parameters of peaks can be arbitrarily binded with each other, eg. width of the peak can be an independent variable, can be the same as the width of the other peak or can be given by complicated - common for all peaks - formula.

Fityk is free software; you can redistribute and modify it under the terms of GPL, version 2. See Appendix C, License for details. You can download the latest version of fityk from http://www.unipress.waw.pl/fityk . or http://fityk.sf.net . To contact author, visit the same page.

How to read this manual

After this introduction, read the Chapter 2, Getting started. If you are using GUI you can look at screenshots-based tutorial at webpage and postpone reading Chapter 3, Reference until you need to write a script or understand better how the program works.

GUI vs CLI

The program comes in two versions: GUI (Graphical User Interface) version - more comfortable for most users, and CLI (Command Line Interface) version (named cfityk to differentiate, Unix only).

If CLI version is compiled with the GNU Readline Library, it enables command line editing and command history like in bash. Especially useful is TAB-expanding. Data and curve fitted to data are visualized with gnuplot program (if it is installed).

GUI version is written using wxWidgets library. One of the main features of this library is portability. Program can be run on Unix species with GTK+ (it is developed on Linux) and on MS Windows. There are also people using it on MacOS X (have a look at fityk-users mailing list archives for details).

Chapter 2. Getting started

The minimal example

Let us analyze a diffraction pattern of NaCl. Our goal is to determine the position of the center of the highest peak. It is needed for calculating pressure, under which the sample was measured, but the further processing does not matter in our example.

Data file used in this example is distributed with the program and can be found in samples directory.

First load data from file nacl01.dat. You can do it by typing @0 < nacl01.dat in CLI version (or in GUI version in the input box - at the bottom, just above the status bar). In GUI, you can select Data->Load File from menu and choose proper file, instead.

If you use GUI, you can zoom in the biggest peak using left mouse button on the auxiliary plot (the plot below main plot). To zoom out, press View whole toolbar button. Other ways of zooming are described in the section called “Mouse usage”. If you want the data to be drawn with bigger points or line, or if you want to change color of line or background, press right mouse button on the main plot and use Data point size or Color menu from pop-up menu. To change color of data points, use right-hand panel.

Now all data points are active. Because only the biggest peak is of our interest, the rest of points can be deactivated. Type: a = (23.0 < x < 26.0) or change to range mode (press Data-Range Mode button on toolbar) and select range to be deactivated with right mouse button.

We see that our data has no background, we would have to worry about, so now we only have to define peak with reasonable initial values of peak's parameter and fit it to data. We will use Gaussian. To see its formula, type: info Gaussian or look for it in the documentation (in Appendix A, List of functions). By the way, most of the commands can be abbreviated, eg. you can type: i Gaussian.

To define peak, type: %p = Gaussian(~60000, ~24.6, ~0.2) -> F or %p = guess Gaussian or select Gaussian from the list of functions on toolbar and press auto-add toolbar button. There are also other ways to add peak in GUI, try add-peak mode. These mouse-driven ways give function a name like %_1, %_2, etc.

Now let us fit the function. Type: fit or select Fit->Run from menu (or press toolbar button).

To see peak parameters, type: info+ %p or (in GUI) move the cursor to the top of the peak and try out context menu (right button), or use right-hand panel.

That is all. To do the same second time, you can write all the commands to file (you can do it now using command commands > filename), and use it as script: commands < nacl01.fit or select Session->Execute script from menu, or run program with the name of the script: bash$ fityk nacl01.fit

Invoking fityk

On startup, the program executes a script from the $HOME/.fityk/init file. Then the program handles files given as arguments on command line. If the filename has extension ".fit" or the file starts with "# Fityk" string, it is assumed to be a script and is executed. Otherwise, it is assumed to be a data file and is loaded. There are also other parameters to CLI and GUI versions of the program. Option "-h" gives the full listing.

     
     wojdyr@ubu:~/fityk/src$ ./fityk --help
     Usage: fityk [-h] [-V] [-c <str>] [-I] [-r] [script or data file...]
      -h, --help            show this help message
      -V, --version         output version information and exit
      -c, --cmd=<str>       script passed in as string
      -I, --no-init         don't process $HOME/.fityk/init file
      -r, --reorder         reorder data (50.xy before 100.xy)
    

Graphical interface

Plots and other windows

The window of fityk program consists of (from the top): menu bar, toolbar, main plot, auxiliary plot, output window, input field, status bar and of sidebar at right-hand side. The input field allows to type and execute commands in similar way as it is done in CLI version. The output window (which is configurable through pop-up menu) shows results. Actually all GUI commands are converted into text and visible in output window.

The main plot can display data points, functions and/or the sum of all functions. Use pop-up menu (click right button on the plot) to configure it. Some properties of plot (eg. colors of data points) can be changed using the sidebar.

One of the most useful things which auxiliary plot can display is the difference between data and the sum of functions. The plot can be controlled by pop-up menu. I hope quick look at this menu and a minute of experiments will make all possibilities of the auxiliary plot clear.

Configuration of GUI (visible windows, colors, etc.) can be saved using GUI->Save current config. Two different configurations can be saved, what allows easy changing of colors for printing. On Unix platform, these configurations are stored in file in user's home directory. On Windows - they are stored in registry (perhaps in future they will be also stored in file).

Mouse usage

The usage of the mouse on menu, dialog windows, input field and output window is intuitive, so the only topic described here is how to effectively operate mouse on plots.

Let us start with the auxiliary plot. Right button displays pop-up menu, with left button you can select range to be displayed - range on x axis. Clicking with middle button (or with left button and pressed Shift) will zoom it out and display all data.

On the main plot, meaning of the left and right mouse button depends on current mode, that can be changed using toolbar or menu. There are hints on the status bar. In normal mode, left button is used for zooming and right invokes pop-up menu. The same behaviour can be obtained in any mode by pressing Ctrl (or Alt.). Middle button can be used to select a rectangle that you want to zoom in. If an operation has two steps, like rectangle zooming (first you press button to select first corner, then you move mouse and release button to select second corner of rectangle), you can cancel it by pressing another button when first one is pressed.

Chapter 3. Reference

General syntax

Basically, there is one command is in one line. If, for some reasons, it is more comfortable to place more than one command in one line, they can be separated with semicolon (;).

Most of the commands can have arguments separated by comma (,), eg. delete %a, %b, %c.

Most of the commands can be shortened: eg. you can type inf or in or i instead of info. See Appendix B, Command shortenings for details.

Symbol '#' starts a comment - everything from hash (#) to the end of the line is ignored.

Data from experiment

Loading data

Data are stored in files. Unfortunately, there are various formats of files with data. The basic one is text file with every line corresponding to one data point. The line should contain at least two numbers: x and y of point. It can also contain standard deviation of y coordinate. Numbers can be separated by whitespace, commas, colons or semicolons. Some lines can contain comments or extra informations. If these lines have a hash (#) in first column, they are ignored. In other case, they are also ignored (unless they can be read as data point). There are also other file types, that can be read: .rit, .cpi, Siemens-Bruker.raw and .mca. In future, the way the special file formats are handled will be changed (external library will be used for it, unfortunatelly this library does not exists yet).

Points are loaded from files using command

dataslot < filename [filetype] [xcol,ycol [,scol] ]

where dataslot should be replaced with @0, unless many datasets are to be used simultanously (for details see: the section called “Working with many datasets”), filetype can be omitted (at this moment, due to a small number of supported formats, the filetype can be detected automatically), xcol, ycol, scol, are unsigned integers. If the filename contains blank characters, semicolon or comma, it should be put inside of single quotation marks. If the file is in a text format (columns with numbers) it can be specified which column contains x, y and, optionally, std. dev. of y.

Some information about current data can be obtained using command:

info dataslot

Active and inactive points

We often have situation, that only part of data from file is interesting for us. We should be able to exclude selected points from fitting and all computations. Every point can be either active or inactive. It can be done with command A=... (see the section called “Data transformations” for details) or with mouse-click in GUI. The idea of active and inactive points is simple: only the active ones are subject to fitting and peak-finding, inactive ones are neglected in these cases.

Standard deviation or weight

When fitting data, we assume that only y coordinate of data is subject to statistical errors in measurement. It is very common assumption. To see how y standard deviation sigma influences fitting (optimization), look at weighted sum of squared residuals formula in the section called “Nonlinear optimization ”. We can also think about weights of points - every point has a weight assigned, that is equal wi=1/sigma^2

Standard deviation of points can be read from file together with x and y coordinates. Otherwise, it is set to max(sqrt(y), 1.0), Setting std. dev. as a square root of value is common and has theoretical ground when y is the number of independent events. You can always change standard deviation, eg. make it equal for every point with command: S=1. See the section called “Data transformations” for details.

Data transformations

Every data point has four properties: x coordinate, y coordinate, standard deviation of y and active/inactive flag. Lower case letters x, y, s, a stand for these properties before transformation, and upper case X, Y, S, A for the same properties after transformation. M stands for the number of points. You can transform data using assignments. Command Y=-y will change the sign of y coordinate of every point. You can also apply transformation to selected points: Y[3]=1.2 will change point with index 3 (which is 4th point, because first has index 0), and Y[3...6]=1.2 will do the same for points with indices 3, 4, 5, but not 6. Y[2...]=1.2 will apply transformation to point with index 2 and all next points. You can guess what Y[...6]=1.2 does. Most of operations are executed sequentially for points from the first to the last one. n stands for the index of currently transformed point. The sequance of commands: M=500; x=n/100; y=sin(x) will generate the sinusoid dataset with 500 points.

If you have more than one dataset, you have to specify explicitely which dataset transformation applies to. See the section called “Working with many datasets” for details.

Points are kept sorted according to x coordinate, so changing x coordinate of points will also change the order and indices of points.

Expressions can contain real numbers in normal or scientific format (eg. 1.23e5), constant pi, binary operators: +, -, *, /, ^, one argument functions: sqrt, exp, log10, ln, sin, cos, tan, atan, asin, acos, gamma, lgamma (=ln(|gamma|)), abs, round (rounds to the nearest integer), two arguments functions: min2, max2 (eg. max2(3,5) will give 5), randuniform(a, b) (random number from interval (a, b)), randnormal(mu, sigma) (random number from normal distribution) and ternary ?: operator: condition ? expression1 : expression2, which performs expression1 if condition is true and expression2 otherwise. Conditions can be built using boolean operators and comparisions: AND, OR, NOT, >, >=, <, <=, ==, != (or <>), TRUE, FALSE.

Value of a data expression can be shown using command info, see examples at the end of this section.

t[x=expression], where t=x,y,s,a,X,Y,S,A gives linear interpolation of t between two points (or the value of first/last point if given x is outside of the current data range).

Important note: all operations are performed on real numbers. Two numbers that differ less than epsilon=1e-9 ie. abs(a-b)<epsilon, are considered equal. Indices are also computed in real number domain, and then rounded to the nearest integer.

Transformations can be joined with comma (,), eg. X=y, Y=x swaps axes.

Before and after executing transformations, points are always sorted according to x coordinate. You can change order of points using order=t, where t is one of x, y, s, a, -x, -y, -s, -a. It only has a sense in sequence of transformations joined with comma, because after finishing transformation, points will be reordered again.

Points can be deleted using following syntax: delete[index-or-range] or delete(condition) and created simply by increasing value of M.

There are two parametrized functions: spline and interpolate. The general syntax is: parametrizedfunc [param1, param2](expression) eg. spline[22.1, 37.9, 48.1, 17.2, 93.0, 20.7](x) will give value of cubic spline interpolation through points (22.1, 37.9), (48.1, 17.2), ... in x. Function interpolation is similar, but give polyline interpolation. Spline function is used for manual background substraction in GUI.

There are also aggragate functions: min, max, sum, avg, stddev, darea. They have two forms. In the simpler one: aggragatefunc (expression), the value of expression in brackets is calculated for all points. min gives the smallest value, max the largest, sum, avg and stddev give the sum of all values, arithmetic mean and standard deviation, respectively. True value in data expression is represented numerically by 1., and false by 0, so sum can be also used to count points that fulfill given criterium.

darea gives the sum of expressions calculated using formula: t*(x[n+1]-x[n-1])/2, where t is the value of expression in brackets. darea(x) gives the area under interpolated data points, and can be used to normalize area.

The second form: aggragatefunc (expression if condition) takes into account only points for which condition is true.

A few examples:

     
     Y[1...] = Y[n-1] + y[n] # integrate

     x[...-1] = (x[n]+x[n+1])/2;  # reduces
     y[...-1] = y[n]+y[n+1];      # two times
     delete(n%2==1)               # number of points

     delete(not a) # delete inactive points

     X = 4*pi * sin(x/2*pi/180) / 1.54051 # changes x scale (2theta -> Q)

     # make equal step, keep the number of points the same
     X = x[0] + n * (x[M-1]-x[0]) / (M-1),  Y = y[x=X], S = s[x=X], A = a[x=X]

     # take the first 2000 points, avarage them and substract this as background
     Y = y - avg(y if n<2000)

     # fityk can be used as a simple calculator
     i 2+2 #4
     i sin(pi/4)+cos(pi/4) #1.41421
     i gamma(10) #362880

     # examples of aggregate functions
     i max(y) # the largest y value
     i sum(y>avg(y)) # the number of points which have y value greater than arithmetic mean
     Y = y / darea(y) # normalize data area
     i darea(y-F(x) if 20<x<25)
     

Functions and variables in data transformations

Informations in this section are not often used in practice. Read it after reading the section called “Sum of fitted functions ”.

Variables ($foo) and functions (%bar) can be used in data transformations, and a current value of data expression can be assigned to variable. Also values of function parameters (eg. %fun[a0]) and pseudo-parameters Center, Height, FWHM and Area (eg. %fun[Area]) can be used. Pseudo-parameters are supported only by functions, which know how to calculate these properties.

Some properties of functions can be calculated using functions numarea, findx and extremum.

numarea(%f, x1, x2, n) gives area integrated numerically from x1 to x2 using trapezoidal rule with n equal steps.

findx(%f, x1, x2, y) finds x in interval (x1, x2) such that %f(x)=y using bisection method combined with Newton-Raphson method method. It is required that %f(x1) < y < %f(x2).

extremum(%f, x1, x2) finds x in interval (x1, x2) such that %f'(x)=0 using bisection method. It is required that %f'(x1) and %f'(x2) have different signs.

A few examples:

     
      $foo = {y[0]} # data expression can be used in variable assignment
      Y = y / $foo  # and variables can be used in data transformation

      Y = y - %f(x) # substracts function %f from data

      Y = y - @0.F(x) # substracts all functions in F

      %c = Constant(~0) -> Z  # fit constant zero-shift (it can be caused...
      fit                # ...by shift in scale of instrument collecting data),
      X = x + @0.Z(x)  # ...remove it from dataset,
      del %c           # ...and delete it from sum

      info numarea(%fun, 0, 100, 10000) # shows area of function %fun 
      info %fun[Area] # it is not always supported

      info %_1(extremum(%_1, 40, 50)) # shows extremum value

      # calculate FWHM numerically, value 50 can be tuned
      $c = {%f[Center]}
      i findx(%f, $c, $c+50, %f[Height]/2) - findx(%f, $c, $c-50, %f[Height]/2)
      i %f[FWHM] # should give almost the same
     

Working with many datasets

Let call a set of data that usually comes from one file - a dataset. All operations described above assume that only one dataset. If there are more datasets created, it must be explicitly written which dataset the command is applied to, eg. M=500 in @0. Datasets have numbers and are referenced by '@' with the number, eg. @3. @* means all datasets, and Y=y/10 in @* will do what is expected to.

Command

@+ < filename [filetype] [xcol,ycol [,scol] ]

will load dataset to new slot. Using @+ increases the number of datasets, and command delete @n decreases it. It is also possible to duplicate dataset (command @+ < @n) or create new dataset as a sum of two or more existing ones (command @+ < @n + @m + ...).

Each dataset has a separate sum, ie. a model that can be fitted to the data. It is explained in the next chapter.

Each dataset has a title. It does not have to be unique. When loading file, a title is automatically created, either using filename or it is read from the file (it depends on the format of the file). It is used for exporting data -- some file formats require it (at this moment such formats are not supported). Title can be changed using command @n.title=new-title . To see title of the dataset, use info @n.

Exporting data

Command

dataslot (expression, ...) > filename

can export data to ASCII TSV (tab separated values) file. To export data in 3-column (x, y and standard deviation) format, use @n (x, y, s) > file. If a is not listed in the list of columns, like in this example, only active points are exported.

All expressions that can be used on right hand side of data transformations can be used in column list, eg. @0 (n+1, x, y, @0.F(x), y-@0.F(x), @0.Z(x), %foo(x), a, sin(pi*x)+y^2) > bar.tsv. [To make it easier to export all functions in F separately, there is an exception in syntax: "*F(x)" is replaced with all functions in sum of the exported dataset, eg. with "%_1(x) %_2(x) %_3(x)". Now I think it's not a good idea and i'm going to remove it. If you find it useful, let me know.]

Sum of fitted functions

Sum - Introduction

The sum of functions S - curve that is fitted to data - is itself a function. The value of the whole sum is computed as a sum of functions, like Gaussians or polynomials, and can be given by formula: 
         S = \sumi fi
        , where fi is a function of x, and depends on a vector of parameters a. This vector contains all fitted parameters. Because we often have the situation, that the error in the x coordinate of data points can be modeled with function z(x; a), we introduce this term to sum:


         S(x;a) = \sumi fi
                                                                (x+z(x;a);a)

where 
         z(x;a) = \sumj zj(x;a)
        . Note that the same z(x) is used in all functions.

Now we will have a closer look at fi functions. Every function fi has a type chosen from function types available in the program. The same is true about functions zi. One of these types is the Gaussian. It has the following formula:


         height exp[-ln(2) ((x-center)/hwhm)^2]

There are three parameters of Gaussian. These parameters does not depend on x. There must be one variable binded to each parameter.

Variables

Variables in Fityk have names prefixed with dollar ($). Variable is created by assigning a value to it, eg. $foo=~5.3 or $c=3.1 or $bar=5*sin($foo). $foo is here a so-called simple variable. It is created by assigning to it real number prefixed with ~. The `~' means that value assigned to the variable can be changed when fitting sum to data. For people familiar with optimization techniques: the number of defined simple variables is the number of dimensions of space we are looking for optimum in. Variable $c is actually a constant. $bar depends on the value of $foo. When $foo changes, the value of $bar also changes. Compound variables can be build using operators +, -, *, /, ^ and functions sqrt, exp, log10, ln, sin, cos, tan, atan, asin, acos, lgamma. Note that this is a subset of functions used in data transformations.

Every simple parameter has a value and, optionally, domain. Domain is used only by fitting algorithms, which need to randomly initialize or change variables. Genetic Algorithms are a good example. [TODO: setting domain is not implemented at this moment, but will be added soon]

Variables can be used in data tranformations. Also value of data expression can be used in variable definition, but it must be inside of braces, eg. $bleh={M} or, to create simple variable: $bleh=~{M}.

Sometimes it is useful to freeze variable, ie. to prevent it from changing while fitting. There is no special syntax for it, but it can be done using data expressions in this way:

      $a = ~12.3 # $a is fittable
      $a = {$a}  # $a is not fittable
      $a = ~{$a}  # $a is fittable again
     

It is also possible to define a variable as eg. $bleh=~9.1*exp(~2). In this case two simple variables (with values 9.1 and 2) will be created automatically. Automatically created variables are named $_1, $_2, $_3, etc.

Variables can be deleted using command delete $variable.

Some fitting algorithms need to randomize parameters of fitted function (i. e. simple variables). For this purpose, the simple variable can have specified domain. Note that the domain does not imply any constraints on the value the variable can have. It is only a hint for fitting methods like Nelder-Mead simplex or Genetic Algorithms. Read descriptions of these methods to know how the domain is used. The syntax is following:

      $a = ~12.3 [11 +- 5] # center and width of the domain is given

      $b = ~12.3 [ +- 5] # if the center of the domain is not specified, 
                         # current value of the variable is used
     

Function types and functions

Let us go back to functions. Function types have names that start with upper case letter, eg. Linear or Voigt. Functions (ie. function instances) have names prefixed with percent, eg. %func. Every function has a type and variables binded to its parameters.

To see list of available function types, use command info types. You can also use command info typename, eg. info Pearson7 to see parameter names, default values and formula.

Function can be created by giving type and a proper number of comma-separated variables in brackets, eg: %f = Gaussian(~66254., ~24.7, ~0.264) or %f = Gaussian(~6e4, $ctr, $b+$c). Every expression, which is valid on the right hand side of variable assignment, can be put as a variable. If it is not just a name of a variable, an automatic variable is created. In the last example two variables are created (value 60000 and the sum).

The second way is to give named parameters of function, in any order, eg. %f = Gaussian(height=~66254., hwhm=~0.264, center=~24.7) Function types can can have specified default values for some parameters, so this assignment is also valid: %f = Pearson7(height=~66254., center=~24.7, fwhm=~0.264) , although shape parameter of Pearson7 is not given.

A deep copy of function (ie. all variables it dependends on are copied) can be made using command %function =copy(%anotherfunction)

Functions can be also created with command guess, as described in the section called “Guessing peak location ”.

You can change variable binded to any of function parameters in this way:

      =-> %f = Pearson7(height=~66254., center=~24.7, fwhm=~0.264)
      New function %f was created.
      =-> %f[center]=~24.8
      =-> $h = ~66254
      =-> %f[height]=$h
      =-> info %f
      %f = Pearson7($h, $_5, $_3, $_4)
      =-> $h = ~60000 # variables are kept by name, so this also changes %f
      =-> %p1[center] = %p2[center] + 3 # keep fixed distance between %p1 and %p2
     

Functions can be deleted using command delete %function.

Used-defined functions (UDF)

User-defined function types can be created using command define, and than used in the same way as built-in function types. The name of new type must start with upper-case letter, contain only letters and digits, have at least two characters and can not be the same as a name of built-in function. Defined functions can be undefined using command undefine.

The name of UDF should be followed by parameters in brackets (see examples). Names of parameters should contain only lowercase alphanumeric characters and underscore (_), and start with lowercase letter. The name "x" is reserved, do not put it into parameter list, just use it on the right hand side of the definition.

Each parameter can have specified default value. To allow adding peak with command guess, default value is given as an expression which can be calculated for known "height", "center", "fwhm" and "area". If the name itself is one of the following: "height", "center", "fwhm, "area" or "hwhm", default value is deduced (in case of "hwhm" it is "fwhm/2").

UDFs can be defined either giving full formula, or as a sum or modification of already defined functions. Hopefully examples below will make the syntax clear.

How it works (you can skip this paragraph): formula is parsed, derivatives of the formula are calculated symbolically, all expressions are simplified (but there is a lot of space for optimization here), bytecode is created for kind of virtual machine, and when fitting, the VM calculates value of the function and derivatives in every point. Common Subexpression Elimination is not implemented yet, I suppose it will noticably speed up UDFs.

Hint: use init file for often used definitions. See the section called “Invoking fityk ” for details.

Examples:

     
     # first how some built-in functions could be defined
     define MyGaussian(height, center, hwhm) = height*exp(-ln(2)*((x-center)/hwhm)^2)
     define MyLorentzian(height, center, hwhm) = height/(1+((x-center)/hwhm)^2)
     define MyCubic(a0=height,a1=0, a2=0, a3=0) = a0 + a1*x + a2*x^2 + a3*x^3

     # supersonic beam arrival time distribution
     define SuBeArTiDi(c, s, v0, dv) = c*(s/x)^3*exp(-(((s/x)-v0)/dv)^2)/x
     

     # area-based Gaussian can be defined as modification of built-in Gaussian
     # (it is the same as built-in GaussianA function)
     define GaussianArea(area, center, hwhm) = Gaussian(area/fwhm/sqrt(pi*ln(2)), center, hwhm) 

     # sum of Gaussian and Lorentzian -- it is already defined as PseudoVoigt
     define GLSum(height, center, hwhm, shape) = Gaussian(height*(1-shape), center, hwhm) + Lorentzian(height*shape, center, hwhm)

     # to change definition of UDF, first undefine previous definition
     undefine GaussianArea
     

Speed of computations

With default settings, value of every function is calculated in every point. Functions like Gaussian often have non-neglectible value only in small fraction of all points. To speed up calculation, set option cut-function-level to non-zero value. Note, that some functions may not support this optimization at all, and for other approximations are used and the exact value of cut-off level can differ.

Sum, F and Z

As it was already written, each dataset has a separate sum. ie. a model that can be fitted to the data. It can be seen in formula above, that the sum consists of functions fi and zi. Each dataset has two sets named F and Z. They contain names of functions. Sum is constructed by telling which functions are in F and which in Z.

In many cases Z can be forgotten. Then fitted curve is a sum of all functions in F. The functions can be listed with command info F.

Command %function -> F puts %function into F, command %function -> Z puts %function into Z, and command %function -> N removes %function from F or Z. If there is more than one dataset, F, Z and N must be prefixed with dataset number, eg. %function -> @1.F or %function -> @0.N . Following syntax is also valid:

  # create and add funtion to F
  %g = Gaussian(height=~66254., hwhm=~0.264, center=~24.7) -> @0.F
  # create automatically named funtion and add it to F
  Gaussian(height=~66254., hwhm=~0.264, center=~24.7) -> @0.F
  # clear F
  @0.F = 0
  # clear F and put three functions in it
  @0.F = %a, %b, %c
  # make @1.F the exact (shallow) copy of @0.F
  @1.F = @0.F
  # make @1.F a deep copy of @0.F (it means all functions and variables
  # are duplicated).
  @1.F = copy(@0.F) 

Sum can be exported as data points, using syntax described in the section called “Exporting data”, or as mathematic formula, using command @n.formula > filename. Some primitive simplifications are applied to the formula. To prevent it, put plus sign (+) after ".formula". Peak parameters can be exported using command @n.peaks > filename. Put plus sign (+) after ".peaks" to export also symmetric errors of the parameters. "@*" will export formulea or parameters used in all datasets to the same file. [I think I'll change the syntax described in this paragraph, but I'm not sure yet.]

It is often required to keep width or shape of peaks constant for all peaks in dataset. To change variables binded to parameters with given name of all functions in F, use command: F[param]=variable . Examples:

  F[hwhm]=$foo # hwhm's of all functions in F that have parameter hwhm will be 
               # equal $foo. (hwhm means here half-width-at-half-maximum)
  F[shape]=%_1[shape]  # variable binded to shape of peak %_1 is binded
                       # also to shapes of all functions in F
  F[hwhm]=~0.2  # For every function in F a new variable is created and binded 
                # to parameter hwhm. All these parameters are independent. 

Guessing peak location

It is possible to guess peak location and add it to F with command: %name = guess PeakType [x1:x2] in @n , eg. guess Gaussian [22.1:30.5] in @0. If the range is omitted, the whole dataset will be searched. Name of function is optional. Some of parameters can be specified with syntax parameter=variable, eg. guess PseudoVoigt [22.1:30.5] center=$ctr, shape=~0.3 in @0.

Fityk offers only primitive algorithm for peak-detection. It looks for highest point in given range, and than tries to find out width of peak.

If the highest point is found near the boundary of the given range, it is very probable that it is not the peak top, and, if the option can-cancel-guess is set true, the guess is canceled.

There are two real-number options related to guess: height-correction and width-correction. Default value of them is 1. The guessed height and width are multiplied by the values of these options respectively.

Displaying informations

If you are using GUI, most of informations can be displayed with mouse clicks. Otherwise, you can use info command. Using info+ instead of info sometimes displays more datailed informations. Command info guess range will show, where the guess command would find the peak. info functions lists all defined functions, and info variables - all variables. info @n.F and info @n.Z show informations about F and Z, info @n.formula shows mathematic formula of fitted function, and info @n.dF(x) compares symbolic and numeric derivatives in x (useful only for debugging).

Fitting

Nonlinear optimization

This is the core. We have a set of observations (data points), and we want to fit a model (sum of functions), that depends on adjustable parameters, to observations. Let me quote Numerical Recipes, chapter 15.0, page 656 (if you do not know the book, visit http://www.nr.com ):

The basic approach in all cases is usually the same: You choose or design a figure-of-merit function (merit function, for short) that measures the agreement between the data and the model with a particular choice of parameters. The merit function is conventionally arranged so that small values represent close agreement. The parameters of the model are then adjusted to achieve a minimum in the merit function, yielding best-fit parameters. The adjustment process is thus a problem in minimization in many dimensions. [...] however, there exist special, more efficient, methods that are specific to modeling, and we will discuss these in this chapter. There are important issues that go beyond the mere finding of best-fit parameters. Data are generally not exact. They are subject to measurement errors (called noise in the context of signal-processing). Thus, typical data never exactly fit the model that is being used, even when that model is correct. We need the means to assess whether or not the model is appropriate, that is, we need to test the goodness-of-fit against some useful statistical standard. We usually also need to know the accuracy with which parameters are determined by the data set. In other words, we need to know the likely errors of the best-fit parameters. Finally, it is not uncommon in fitting data to discover that the merit function is not unimodal, with a single minimum. In some cases, we may be interested in global rather than local questions. Not, "how good is this fit?" but rather, "how sure am I that there is not a very much better fit in some corner of parameter space?"

Our function of merit is WSSR - weighted sum of squared residuals, called also chi-square:


         chi2 
         = sumi=1N
         [(yi - y(xi;a))
         /sigmai]2
         = sumi=1N
         wi
         [yi - y(xi;a)]
         2

Weights can be are based on standard deviations, wi=1/sigma^2. You can learn why squares of residuals are minimized eg. from chapter 15.1 of Numerical Recipes. So we are looking for a global minimum of chi2. This large field of numerical research - looking for minimum or maximum - is usually called optimization; it is non-linear and global optimization. Fityk implements three very different optimization methods. All are well-known and described in many book.

Fitting related commands

To fit sum to data, use command

fit [+] [number-of-iterations] [in @n, ...]

Plus (+) means that fitting method is not initialized. It is used to continue previous fitting. All non-linear fitting methods are iterative. number-of-iterations is the maximum number of iterations. There are also other stopping criteria, so the number of executed iterations can be smaller.

Fitting methods can be set using set command: set fitting-method = method, where method is one of: Levenberg-Marquardt, Nelder-Mead-simplex, Genetic-Algorithms.

All non-linear fitting methods are iterative, and there are two common stopping criteria. The first is the number of iterations and can be specified after fit command. The second is the number of evaluations of objective function (WSSR), specified by the value of option max-wssr-evaluations (0=unlimited). It is approximately proportional to time of computations, because most of time in fitting process is taken by evaluating WSSR. There are also other criteria, different for each method.

If you give too small n to fit command, and fit is stopped because of it, not because of convergence, it makes sense to use fit+ command to process further iterations. [TODO: how to stop fit interactively]

Setting set autoplot = on-fit-iteration will draw a plot after every iteration, to visualize progress. (see autoplot)

Informations about goodness of fit can be displayed using info fit. To see symmetric errors use info errors, and info+ errors shows also variance-covariance matrix.

Available methods can be mixed together, eg. it is sensible to obtain initial parameter estimates using simplex method, and than fit it using Levenberg-Marquard method. Command s.history can be useful for trying various methods with different options and/or initial parameters and choosing the best solution.

Levenberg-Marquardt

It is a standard of nonlinear least-squares routines. It involves computing first derivatives of functions. For description of L-M method see Numerical Recipes, chapter 15.5 or Siegmund Brandt Data Analysis, chapter 10.15. In a few words: it combines inverse-Hessian method (called Gauss-Newton method?) with steepest descent method by introducing lambda factor. When lambda is equal 0, the method is equivalent to inverse-Hessian method. When lambda increases, the shift vector is rotated toward the direction of steepest descent and the length of the shift vector decreases. (The shift vector is a vector that is added to the parameter vector.) If the better fit is found in iteration, lambda is decreased - it is divided by the value of lm-lambda-down-factor option (default: 10). Otherwise, lambda is multiplied by the value of lm-lambda-up-factor (default: 10). Initial lambda value is equal to lm-lambda-start (default: 0.0001).

Marquardt method has two stopping criteria apart from common stopping criteria. If it happens two times in sequence, that relative change of the value of objective function (WSSR) is smaller then the value of lm-stop-rel-change option, fit is considered to be converged and is stopped. And if lambda is greater than the value of lm-max-lambda option (default: 10^15), what happens usually when due to limited numerical precision WSSR is not changing anymore, fitting is also stopped.

Nelder-Mead downhill simplex method

This time I am quoting chapter 4.8.3, p. 86 of Peter Gans Data Fitting in the Chemical Sciences by the Method of Least Squares

A simplex is a geometrical entity that has n+1 vertices corresponding to variations in n parameters. For two parameters the simplex is a triangle, for three parameters the simplex is a tetrahedron and so forth. The value of the objective function is calculated at each of the vertices. An iteration consists of the following process. Locate the vertex with the highest value of the objective function and replace this vertex by one lying on the line between it and the centroid of the other vertices. Four possible replacements can be considered, which I call contraction, short reflection, reflection and expansion.[...]

It starts with an arbitrary simplex. Neither the shape nor position of this are critically important, except insofar as it may determine which one of a set of multiple minima will be reached. The simplex than expands and contracts as required in order to locate a valley if one exists. Then the size and shape of the simplex is adjusted so that progress may be made towards the minimum. Note particularly that if a pair of parameters are highly correlated, both will be simultaneously adjusted in about the correct proportion, as the shape of the simplex is adapted to the local contours.[...]

Unfortunately it does not provide estimates of the parameter errors, etc. It is therefore to be recommended as a method for obtaining initial parameter estimates that can be used in the standard least squares method.

This method is also described in previously mentioned Numerical Recipes (chapter 10.4) and Data Analysis (chapter 10.8).

Note

The rest of this chapter is outdated, At this moment default settings of fitting methods can't be changed. It will be fixed as soon as possible.

There are a few options for tuning this method. One of those is a stopping criterium min-fract-range. If value of expression 2(M-m)/(M+m), where M and m are values of worst and best vertices respectively (values of objective functions of vertices, to be precise), is smaller then value of min-fract-range option, fitting is stopped. In other words, it is stopped if all vertices are at almost the same level.

The rest of options is related to initialization of simplex. Before starting iterations, we have to chose set of points in space of parameters, called vertices. Unless option move-all is set, one of these points will be the current point - values that parameters have at this moment. All but this one are drawn as follows: each parameter of each vertex is drawn separately. It is drawn from distribution, that has center in center of domain of parameter, and width proportional to both width of domain and value of move-multiplier parameter. Distribution type can be set using option distrib-type as one of: uniform, Gaussian, Lorentzian and bound. The last one causes value of parameter to be either greatest or smallest value in domain of parameter - one of two bounds of domain (assuming that move-multiplier is equal 1).

Genetic Algorithms

[TODO]

Settings

This chapter is not about GUI settings (things like colors, fonts, etc.), but about settings that are common for both CLI and GUI versions.

Command info set shows syntax of the set command and lists all possible options. set option shows current value of option, and set option = value changes it. It is also possible to change value of option only for one command by prepending the command with with option = value . Examples at the end of this chapter should make it clearer.

Some fitting methods and functions like randnormal in data expressions are using pseudo-random number generator. In some situations one may want to have repeatable and predictable results of fitting, eg. to make a presentation. Seed for a new sequence of pseudo-random numbers can be set using option pseudo-random-seed. If it is set to 0, the seed is based on the current time and a sequence of pseudo-random numbers is every time different.

[TODO: the rest of options...]

Examples:

     set fitting-method  # show info
     set fitting-method = Nelder-Mead-simplex # change default method
     set verbosity = verbose
     with fitting-method = Levenberg-Marquardt fit 10 
     with fitting-method=Levenberg-Marquardt, verbosity=only-warnings fit 10
    

Other commands

plot: viewing data

In GUI version there is hardly ever need to use this command directly.

Command plot controls visualization of data and the sum. It is used to plot given area - in GUI it is plotted in program's main window, in CLI popular program gnuplot is used, if available.

plot [xrange [yrange] ]

xrange and yrange have one of two following syntaxes:

{[} [min] : [max] {]}

.

The second is just a dot (.), and it remains appropriate range not changed.

Examples:

   plot [20.4:50] [10:20] # show x from 20.4 to 50 and y from 10 to 20

   plot [20.4:] # x from 20.4 to the end, 
                # y range will be fitted to contain all data

   plot . [:10] # x range will not be changed, y from the lowest point to 10 
   plot [:] [:] # all data will be showed   
   plot         # all data will be showed   
   plot . .     # nothing changes
     

Value of option autoplot changes automatic plotting behaviour. By default, plot is refreshed automatically after changing the data or the sum of functions. It is also possible to visualize every iteration of fitting method by replotting peaks after every iteration.

info: show informations

First, there is an option verbosity (not related to command info) which decides about amount of messages displayed when executing commands.

If you are using GUI, most of informations can be displayed with mouse clicks. Otherwise, you can use info command. Using info+ instead of info sometimes displays more datailed information.

The output of info can be redirected to file using info args > filename syntax to truncate the file or info args >> filename to appends to the file.

[TODO: list of all info arguments]

commands, dump, sleep, reset, quit

All commands given during program execution are stored in memory. They can be listed using command: info commands [n:m] or written to file: commands [n:m] > filename . To put all commands executed so far during the session into file foo.fit type commands[:] > foo.fit. With plus sign (+) (ie. info commands+ [n:m] and commands+ [n:m] > filename) informations about exit status of commands will be added.

To log commands to file, when they are executed, use: commands > filename or, to log also output: commands+ > filename . To stop logging, use: commands > /dev/null .

Scripts can be executed using command: commands < filename .

There is also a command dump > filename, [TODO] which is not working at this moment.

Command sleep sec makes program waiting sec seconds, doing nothing.

Command quit works as expected. If this command is found in script it closes the program as well.

If option exit-on-warning is set, any warning will close the program. It ensures, that no warnings can be overlooked.

Chapter 4. Using and extending

Use cases

[TODO]

Extensions

How to add own built-in function

To add built-in function, you have to change the source of the program and compile it. Users who want to do it should be able to compile the program from source and know basics of C, C++ or another programming language.

The description here is not complete. You something is not clear, you can always send me e-mail, etc.

"fp" you can see in fityk source means a real (floating point) number (typedef double fp).

The name of your function should start with uppercase letter and contain only letters and digits. Let us add function Foo with formula: Foo(height, center, hwhm) = height/(1+((x-center)/hwhm)^2). C++ class representing Foo will be named FuncFoo.

In src/func.cpp find a list of functions:

     
       ...
       FACTORY_FUNC(Polynomial6)
       FACTORY_FUNC(Gaussian)
       ...
      

and add:

     
       FACTORY_FUNC(Foo)
      

Then find another list:

     
       ...
       FuncPolynomial6::formula,
       FuncGaussian::formula,
       ...
      

and add line

     
      FuncFoo::formula,
     

Note that in the second list all items but the last one are followed by comma.

Write the function formula in the same file in this way:

     
      const char *FuncFoo::formula
      = "Foo(height, center, hwhm) = height/(1+((x-center)/hwhm)^2)";
     

For built-in functions, only the left hand side of the formula is parsed. I.e. expression "Foo(height, center, hwhm)" is analysed by the program and has to have a valid syntax. Parameter names: "height", "area", "center", "fwhm" and "hwhm" are recognized. hwhm is half width at half maxium, and fwhm is full width at half maxium. If you want to be able to add function with command guess, you must provide default values for all parameters with other names, e.g.: SplitPearson7(height, center, hwhm1=fwhm*0.5, hwhm2=fwhm*0.5, shape1=2, shape2=2). Type "i+ types" for other examples. Now syntax for giving default values is very limited. It can be a real number or one of the recognized parameter names (but hwhm) or recognized parameter * real number.

In file src/bfunc.h start writting definition of your class:

      class FuncFoo : public Function
      {
          DECLARE_FUNC_OBLIGATORY_METHODS(Foo)
     

If you want to make some calculations every time parameters of function are changed, you can do it in method do_precomputations. This possibility is provided for calculating calculating expressions, which does not depend on x. Write the declaration here:

     void do_precomputations(std::vector<Variable*> const &variables);
     

and provide proper definition of this method in src/bfunc.cpp.

If you want to optimize calculation of your function by neglecting its value outside of give range (see option cut-function-level in the program), you will need method:

      bool get_nonzero_range (fp level, fp &left, fp &right) const;
     

This method takes the level, below which the value of the function can be approximated by zero, and should set left and right variable to proper values of x, such that if x<left or x>right than |f(x)|<level. If the function sets left and right, it should return true.

If your function does not have a "center" parameter, and there is a center-like point, where you want the peak top to be drawn, write:

      bool has_center() const { return true; }
      fp center() const { return vv[1]; }
     

In the second line, between return and semicolon, there is an expression for x coordinate of peak top; vv[0] is the first parameter of function, vv[1] is the second, etc.

Finally close the definition of the class with:

      };
     

Write in src/bfunc.cpp how to calculate value of the function:

      FUNC_CALCULATE_VALUE_BEGIN(Foo)
          fp xa1a2 = (x - vv[1]) / vv[2];
          fp inv_denomin = 1. / (1 + xa1a2 * xa1a2);
      FUNC_CALCULATE_VALUE_END(vv[0] * inv_denomin)
     

Expression at the end (i.e. vv[0]*inv_denomin) is the calculated value. xa1xa2 and inv_denomin are variables introduced to simplify expression. Note "fp" (you can also use "double") at the beginning and semicolon at the end of both lines. Meaning of vv was already explained. Usually it is more difficult do calculate derivatives:

      FUNC_CALCULATE_VALUE_DERIV_BEGIN(Foo)
          fp xa1a2 = (x - vv[1]) / vv[2];
          fp inv_denomin = 1. / (1 + xa1a2 * xa1a2);
          dy_dv[0] = inv_denomin;
          fp dcenter = 2 * vv[0] * xa1a2 / vv[2] * inv_denomin * inv_denomin;
          dy_dv[1] = dcenter;
          dy_dv[2] = dcenter * xa1a2;
          dy_dx = -dcenter;
      FUNC_CALCULATE_VALUE_DERIV_END(vv[0] * inv_denomin)
     

You must set derivatives dy_dv[n] for n=0,1,...,(number of parameters of your function - 1) and dy_dx. In the last brackets there is a value of the function again.

After compilation of the program check if the derivatives are calculated correctly using command "info dF(x)", eg. i dF(30.1). You can also use numarea, findx and extremum (see the section called “Functions and variables in data transformations” for details) to verify center, area, height and FWHM properties.

Hope this helps. Do not hesistate to change this description or ask questions if you have any.

Appendix A. List of functions

The list of all functions can be obtained using i+ types. Some formulae here have long parameter names (like "height", "center" and "hwhm") replaced with ai.

Equation A.1. Gaussian

 
       Type in program "info Gaussian" to see formula.

Equation A.2. SplitGaussian

 
       Type in program "info SplitGaussian" to see formula.

Equation A.3. GaussianA

 
       Type in program "info GaussianA" to see formula.

Equation A.4. Lorentzian

 
       Type in program "info Lorentzian" to see formula.

Equation A.5. LorentzianA

 
       Type in program "info LorentzianA" to see formula.

Equation A.6. Pearson VII (Pearson7)

 
       Type in program "info Pearson7" to see formula.

Equation A.7. Split-Pearson-VII (SplitPearson7)

 
       Type in program "info SplitPearson7" to see formula.

Equation A.8. Pearson-VII-Area (Pearson7A)

 
       Type in program "info Pearson7A" to see formula.

Equation A.9. Pseudo-Voigt (PseudoVoigt)

 
       Type in program "info PseudoVoigt" to see formula.

Pseudo-Voigt is a name for sum of Gaussian and Lorentzian. a3 parameters in Pearson VII and Pseudo-Voigt are not related.

Equation A.10. Pseudo-Voigt-Area (PseudoVoigtA)

 
       Type in program "info PseudoVoigtA" to see formula.

Equation A.11. Voigt

 
       Type in program "info Voigt" to see formula.

Voigt function is a convolution of Gaussian and Lorentzian functions. a0 = heigth, a1 = center, a2 is proportional to the Gaussian width, and a3 is proportional to the ratio of Lorentzian and Gaussian widths. Voigt is computed according to R.J.Wells, “ Rapid approximation to the Voigt/Faddeeva function and its derivatives ”, Journal of Quantitative Spectroscopy & Radiative Transfer 62 (1999) 29-48. (See also: http://personalpages.umist.ac.uk/staff/Bob.Wells/voigt.html). Is the approximation exact enough for all possible uses of fityk program?

Equation A.12. VoigtA

 
       Type in program "info VoigtA" to see formula.

Equation A.13. Exponentially Modified Gaussian (EMG)

 
       Type in program "info EMG" to see formula.

Equation A.14. Doniach-Sunjic (DoniachSunjic)

 
       Type in program "info DoniachSunjic" to see formula.

Equation A.15. Polynomial5

 
       Type in program "info Polynomial5" to see formula.

Appendix B. Command shortenings

Pipe symbol (|) shows the minimal length of the command. "def|ine" means that the shortest version is "def", but "defi", "defin" and "define" are also valid and mean exactly the same. Arguments of "info" command can not be shortened, ie. you must write "i fit", not "i f". Commands which can not be shortened are not listed here.

c|ommands
def|ine
f|it
g|uess
i|nfo
p|lot
s|et
undef|ine
w|ith

Appendix C. License

Fityk is free software; you can redistribute and modify it under terms of GNU General Public License, version 2. There is no warranty. GPL is one of most popular licenses, and it is worth reading, if you have not done it before. The program is copyrighted by author, and the license itself by FSF. Text of the license is distributed with the program in file COPYING.

Appendix D. About this manual

This manual is written in DocBook (XML) and converted to other formats. All changes, improvements, fixes of mistakes, etc. are welcome. The fitykhelp.xml file is distributed with program sources, and can be modified with any text editor.

Bibliography

[1] William Press, Saul Teukolsky, William Vetterling, and Brian Flannery. Numerical Recipes in C. http://www.nr.com.

[2] Peter Gans. Data Fitting in the Chemical Sciences by the Method of Least Squares . John Wiley & Sons. 1992.

[3] Siegmund Brandt. Data Analysis. Springer Verlag. 1999.

[4] PeakFit 4.0 for Windows User's Manual. AISN Software. 1997.

[5] Zbigniew Michalewicz. Algorytmy genetyczne + struktury danych = programy ewolucyjne. WNT. 1996.