Chapter 2. Installation

Make things as simple as possible, but not any simpler.

--—Albert Einstein

The Python Distutils are used to build and install PyTables, so it is fairly simple to get the application up and running. If you want to install the package from sources go to the next section. But if you are running Windows and want to install precompiled binaries jump to Section 2.2). In addition, packages are available for many different Linux distributions, for instance T2 Project, RockLinux, Debian, or Gentoo, among others. There are also packages for other Unices like FreeBSD or MacOSX

2.1. Installation from source

These instructions are for both Unix/Linux and Windows systems. If you are using Windows, it is assumed that you have a recent version of MS Visual C++ (>= 6.0) compiler installed. A GCC compiler is assumed for Unix, but other compilers should work as well.

Extensions in PyTables have been developed in Pyrex (see [7]) and C language. You can rebuild everything from scratch if you have Pyrex installed, but this is not necessary, as the Pyrex compiled source is included in the distribution.

To compile PyTables you will need a recent version of Python, the HDF5 (C flavor) library, and the numarray (see [12]) package. Although you won't need NumPy (see [10]) or Numeric (see [11]) in order to compile PyTables, they are supported; you only need a reasonably recent version of them (>= 1.0 for NumPy and >= 24.2 for Numeric) if you plan on using them in your applications. If you already have NumPy and/or Numeric installed, the test driver module will detect them and will run the tests for NumPy and/or Numeric automatically.

2.1.1. Prerequisites

First, make sure that you have at least Python 2.3, 2.4, 2.5 or higher, HDF5 1.6.5 and numarray 1.5.2 or higher installed (I'm using HDF5 1.6.5 and numarray 1.5.2 currently). If you don't, fetch and install them before proceeding.

Compile and install these packages (but see Section 2.2.1 for instructions on how to install precompiled binaries if you are not willing to compile the prerequisites on Windows systems).

For compression (and possibly improved performance), you will need to install the Zlib (see [5]), which is also required by HDF5 as well. You may also optionally install the excellent LZO compression library (see [13] and Section 5.3). The high-performance bzip2 compression library can also be used with PyTables (see [14]). The use of the UCL compression library is in process of being deprecated[5], so it is recommended to not use it unless you have to (you still have data files compressed with UCL). Meanwhile, you can force its support in PyTables by passing the --force-ucl flag to setup.py (see later).

Unix

setup.py will detect HDF5, LZO, UCL or bzip2 libraries and include files under /usr or /usr/local; this will cover most manual installations as well as installations from packages. If setup.py can not find libhdf5 (or liblzo, libucl or libbz2 that you may wish to use) or if you have several versions of a library installed and want to use a particular one, then you can set the path to the resource in the environment, setting the values of the HDF5_DIR, LZO_DIR, UCL_DIR or BZIP2_DIR environment variables to the path to the particular resource. You may also specify the locations of the resource root directories on the setup.py command line. For example:

 --hdf5=/stuff/hdf5-1.6.5
 --lzo=/stuff/lzo-1.08
 --bzip2=/stuff/bzip2-1.0.3

Also, for non-standard installations of numarray, the location of its header files can be given like this:

	       --numarray-headers=/stuff/numarray-1.5.5/numarray/include
	    

You can force the compilation of the deprecated UCL compressor by passing the --force-ucl flag:

 --ucl=/stuff/ucl-1.03 --force-ucl        

If your HDF5 library was built as a shared library not in the runtime load path, then you can specify the additional linker flags needed to find the shared library on the command line as well. For example:

 --lflags="-Xlinker -rpath -Xlinker /stuff/hdf5-1.6.5/lib"        

You may also want to try setting the LD_LIBRARY_PATH environment variable to point to the directory where the shared libraries can be found. Check your compiler and linker documentation as well as the Python Distutils documentation for the correct syntax or environment variable names.

It is also possible to link with specific libraries by setting the LIBS environment variable:

 LIBS="hdf5-1.6.5"
 LIBS="hdf5-1.6.5 nsl"

Finally, you can pass additional flags to your compiler by passing them to the --cflags flag:

 --cflags="-w -O3"        

In the above case, a gcc compiler is used and you instructed it to suppress all the warnings and set the level 3 of optimization.

Windows

Once you have installed the prerequisites, setup.py needs to know where the necessary library stub (.lib) and header (.h) files are installed. Set the following environment variables:

HDF5_DIR

Points to the root HDF5 directory (where the include/ and dll/ directories can be found). Mandatory.

LZO_DIR

Points to the root LZO directory (where the include/ and lib/ directories can be found). Optional.

BZIP2_DIR

Points to the root bzip2 directory (where the include/ and lib/ directories can be found). Optional.

UCL_DIR

Points to the root UCL directory (where the include/ and lib/ directories can be found). Optional, but discouraged.

For example:

 set HDF5_DIR=c:\stuff\5-165-win
 set LZO_DIR=c:\stuff\lzo-1-08
 set BZIP2_DIR=c:\stuff\bzip2-1-0-3

Or, you can pass this information to setup.py by setting the appropriate arguments on the command line. For example:

 --hdf5=c:\stuff\5-165-win
 --lzo=c:\stuff\lzo-1-08
 --bzip2=c:\stuff\bzip2-1-0-3

Also, for non-standard installations of numarray, the location of its header files can be given like this:

	       --numarray-headers=c:\stuff\numarray-1-5-1\numarray\include
	    

You can force the compilation of the deprecated UCL compressor by passing the --force-ucl flag:

 --ucl=c:\stuff\ucl-1-02 --force-ucl

You can get ready-to-use Windows binaries and other development files for most of those libraries from the GnuWin32 project (see [20]).

2.1.2. PyTables package installation

Once you have installed the HDF5 library and the numarray package, you can proceed with the PyTables package itself:

  1. Run this command from the main PyTables distribution directory, including any extra command line arguments as discussed above:

     python setup.py build_ext --inplace

    Depending on the compiler flags used when compiling your Python executable, there may appear many warnings. Don't worry, almost all of them are caused by variables declared but never used. That's normal in Pyrex extensions.

  2. To run the test suite, change into the tables/tests directory and execute this command:

    Unix

    In the shell sh and its variants:

     PYTHONPATH=../..  python test_all.py                
    Windows

    Open a DOS terminal and type:

     set PYTHONPATH=..\.. python test_all.py

    If you would like to see verbose output from the tests simply add the flag -v and/or the word verbose to the command line. You can also run only the tests in a particular test module. For example, to execute just the types test:

     python test_types.py -v

    If a test fails, please enable verbose output (the -v flag and verbose option), run the failing test module again, and, very important, get your PyTables version information by running the command:

     python test_all.py --show-versions            

    and send back the output to developers so that we may continue improving PyTables.

    If you run into problems because Python can not load the HDF5 library or other shared libraries:

    Unix

    Try setting the LD_LIBRARY_PATH environment variable to point to the directory where the missing libraries can be found.

    Windows

    Put the DLL libraries (hdf5dll.dll and, optionally, lzo1.dll and bzip2.dll) in a directory listed in your PATH environment variable or in python_installation_path\Lib\site-packages\tables (the last directory may have not exist yet, so if you want to install the DLLs there, you should do so after installing the PyTables package). The setup.py installation program will print out a warning to that effect if the libraries can not be found.

  3. To install the entire PyTables Python package, change back to the root distribution directory and run the following command (make sure you have sufficient permissions to write to the directories where the PyTables files will be installed):

     python setup.py install

    Of course, you will need super-user privileges if you want to install PyTables on a system-protected area. You can select, though, a different place to install the package using the --prefix flag:

     python setup.py install --prefix="/home/myuser/mystuff"

    Have in mind, however, that if you use the --prefix flag to install in a non-standard place, you should properly setup your PYTHONPATH environment variable, so that the Python interpreter would be able to find your new PyTables installation.

    You have more installation options available in the Distutils package. Issue a:

     python setup.py install --help

    for more information on that subject.

That's it! Now you can skip to the next chapter to learn how to use PyTables.

2.2. Binary installation (Windows)

This section is intended for installing precompiled binaries on Windows platforms. You may also find it useful for instructions on how to install binary prerequisites even if you want to compile PyTables itself on Windows.

2.2.1. Windows prerequisites

First, make sure that you have Python 2.3, 2.4, 2,5 or higher, HDF5 1.6.5 or higher and numarray 1.5.2 or higher installed (I have built the PyTables binaries using HDF5 1.6.5 and numarray 1.5.2).

For the HDF5 library it should be enough to manually copy the hdf5dll.dll, zlib1.dll and szipdll.dll files to a directory in your PATH environment variable (for example C:\WINDOWS\SYSTEM32) or python_installation_path\Lib\site-packages\tables (the last directory may have not exist yet, so if you want to install the DLLs there, you should do so after installing the PyTables package).

Caveat: When downloading the binary distribution for HDF5 libraries, select one compiled with MSVC 6.0 if you are using Python 2.3.x, such as the package 5-165-win.zip. The file 5-165-win-net.zip was compiled with the MSVC 7.1 (aka ".NET 2003") and you must choose if you want to run PyTables with Python 2.4.x or 2.5.x series. You have been warned!

To enable compression with optional LZO or bzip2 libraries (see the Section 5.3 for hints about how they may be used to improve performance), fetch and install the LZO (choose v1.x, LZO v2.x is not supported in precompiled Windows builds) and bzip2 binaries from [20][6]. Normally, you will only need to fetch and install the <package>-<version>-bin.zip file and copy the lzo1.dll or bzip2.dll files in a directory in the PATH environment variable, or in python_installation_path\Lib\site-packages\tables (the last directory may have not exist yet, so if you want to install the DLLs there, you should do so after installing the PyTables package), so that they can be found by the PyTables extensions.

Please, note that PyTables has internal machinery for dealing with uninstalled optional compression libraries, so, you don't need to install any of LZO or bzip2 dynamic libraries if you don't want to.

2.2.2. PyTables package installation

Download the tables-<version>.win32-py<version>.exe file and execute it.

You can (you should) test your installation by unpacking the source tar-ball, changing to the tables/tests/ subdirectory and executing the test_all.py script. If all the tests pass (possibly with a few warnings, related to the potential unavailability of LZO or bzip2 libs) you already have a working, well-tested copy of PyTables installed! If any test fails, please try to locate which test module is failing and execute:

 python test_<module>.py -v verbose

and also:

 python test_all.py --show-versions

and mail the output to the developers so that the problem can be fixed in future releases.

You can proceed now to the next chapter to see how to use PyTables.



[5] This is because of recurrent memory problems in some platforms (perhaps some bad interaction between UCL and something else). Eventually, UCL support will be dropped in the future, so, please, refrain to create datasets compressed with it.

[6] Note that support for the UCL compressor has been declared deprecated and has not been added in the binary build of PyTables for Windows.