PyTables User's Guide

Hierarchical datasets in Python - Release 1.4

Francesc Altet

Ivan Vilata

Scott Prater

Vicent Mas

Tom Hedley

Antonio Valentino

Jeffrey Whitaker

Copyright Notice and Statement for PyTables Software Library and Utilities

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Copyright Notice and Statement for NCSA Hierarchical Data Format (HDF) Software Library and Utilities

NCSA HDF5 (Hierarchical Data Format 5) Software Library and Utilities Copyright 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 by the Board of Trustees of the University of Illinois. All rights reserved.

See more information about the terms of this license at: http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html

Copyright Notice and Statement for the lrucache.py module

Copyright 2004 Evan Prodromou. Licensed under the Academic Free License 2.1.

See more information about the terms of this license at: http://opensource.org/licenses/afl-2.1.php

$LastChangedDate: 2006-12-21 10:05:30 +0100 (Thu, 21 Dec 2006) $


Table of Contents

I. The PyTables Core Library
1. Introduction
1.1. Main Features
1.2. The Object Tree
2. Installation
2.1. Installation from source
2.1.1. Prerequisites
2.1.2. PyTables package installation
2.2. Binary installation (Windows)
2.2.1. Windows prerequisites
2.2.2. PyTables package installation
3. Tutorials
3.1. Getting started
3.1.1. Importing tables objects
3.1.2. Declaring a Column Descriptor
3.1.3. Creating a PyTables file from scratch
3.1.4. Creating a new group
3.1.5. Creating a new table
3.1.6. Reading (and selecting) data in a table
3.1.7. Creating new array objects
3.1.8. Closing the file and looking at its content
3.2. Browsing the object tree
3.2.1. Traversing the object tree
3.2.2. Setting and getting user attributes
3.2.3. Getting object metadata
3.2.4. Reading data from Array objects
3.3. Commiting data to tables and arrays
3.3.1. Appending data to an existing table
3.3.2. Modifying data in tables
3.3.3. Modifying data in arrays
3.3.4. And finally... how to delete rows from a table
3.4. Multidimensional table cells and automatic sanity checks
3.4.1. Shape checking
3.4.2. Field name checking
3.4.3. Data type checking
3.5. Exercising the Undo/Redo feature
3.5.1. A basic example
3.5.2. A more complete example
3.6. Using enumerated types
3.6.1. Enumerated columns
3.6.2. Enumerated arrays
3.7. Dealing with nested structures in tables
3.7.1. Nested table creation
3.7.2. Reading nested tables: introducing NestedRecArray objects
3.7.3. Using Cols accessor
3.7.4. Accessing meta-information of nested tables
3.8. Other examples in PyTables distribution
4. Library Reference
4.1. tables variables and functions
4.1.1. Global variables
4.1.2. Global functions
4.2. The File class
4.2.1. File instance variables
4.2.2. File methods
4.2.3. File special methods
4.3. The Node class
4.3.1. Node instance variables
4.3.2. Node methods
4.4. The Group class
4.4.1. Group instance variables
4.4.2. Group methods
4.4.3. Group special methods
4.5. The Leaf class
4.5.1. Leaf instance variables
4.5.2. Leaf methods
4.6. The Table class
4.6.1. Table instance variables
4.6.2. Table methods
4.6.3. Table special methods
4.6.4. The Row class
4.7. The Cols class
4.7.1. Cols instance variables
4.7.2. Cols methods
4.8. The Description class
4.8.1. Description instance variables
4.8.2. Description methods
4.9. The Column class
4.9.1. Column instance variables
4.9.2. Column methods
4.9.3. Column special methods
4.10. The Array class
4.10.1. Array instance variables
4.10.2. Array methods
4.10.3. Array special methods
4.11. The CArray class
4.11.1. CArray instance variables
4.11.2. Example of use
4.12. The EArray class
4.12.1. EArray instance variables
4.12.2. EArray methods
4.13. The VLArray class
4.13.1. VLArray instance variables
4.13.2. VLArray methods
4.13.3. VLArray special methods
4.14. The UnImplemented class
4.15. The AttributeSet class
4.15.1. AttributeSet instance variables
4.15.2. AttributeSet methods
4.16. Declarative classes
4.16.1. The IsDescription class
4.16.2. The Col class and its descendants
4.16.3. The Atom class and its descendants.
4.17. Helper classes
4.17.1. The Filters class
4.17.2. The IndexProps class
4.17.3. The Index class
4.17.4. The Enum class
5. Optimization tips
5.1. Informing PyTables about expected number of rows in tables
5.2. Accelerating your searches
5.2.1. In-kernel searches
5.2.2. Indexed searches
5.3. Compression issues
5.4. Shuffling (or how to make the compression process more effective)
5.5. Using Psyco
5.6. Getting the most from the node LRU cache
5.7. Selecting an User Entry Point (UEP) in your tree
5.8. Compacting your PyTables files
II. Complementary modules
6. FileNode - simulating a filesystem with PyTables
6.1. What is FileNode?
6.2. Finding a FileNode node
6.3. FileNode - simulating files inside PyTables
6.3.1. Creating a new file node
6.3.2. Using a file node
6.3.3. Opening an existing file node
6.3.4. Adding metadata to a file node
6.4. Complementary notes
6.5. Current limitations
6.6. FileNode module reference
6.6.1. Global constants
6.6.2. Global functions
6.6.3. The FileNode abstract class
6.6.4. The ROFileNode class
6.6.5. The RAFileNode class
7. NetCDF - a PyTables NetCDF3 emulation API
7.1. What is NetCDF?
7.2. Using the tables.NetCDF module
7.2.1. Creating/Opening/Closing a tables.NetCDF file
7.2.2. Dimensions in a tables.NetCDF file
7.2.3. Variables in a tables.NetCDF file
7.2.4. Attributes in a tables.NetCDF file
7.2.5. Writing data to and retrieving data from a tables.NetCDF variable
7.2.6. Efficient compression of tables.NetCDF variables
7.3. tables.NetCDF module reference
7.3.1. Global constants
7.3.2. The NetCDFFile class
7.3.3. The NetCDFVariable class
7.4. Converting between true netCDF files and tables.NetCDF files
7.5. tables.NetCDF file structure
7.6. Sharing data in tables.NetCDF files over the internet with OPeNDAP
7.7. Differences between the Scientific.IO.NetCDF API and the tables.NetCDF API
III. Appendixes
A. Supported data types in PyTables
B. Using nested record arrays
B.1. Introduction
B.2. NestedRecArray methods
B.3. NestedRecord objects
C. Utilities
C.1. ptdump
C.1.1. Usage
C.1.2. A small tutorial on ptdump
C.2. ptrepack
C.2.1. Usage
C.2.2. A small tutorial on ptrepack
C.3. nctoh5
C.3.1. Usage
D. PyTables File Format
D.1. Mandatory attributes for a File
D.2. Mandatory attributes for a Group
D.3. Mandatory attributes, storage layout and supported data types for Leaves
D.3.1. Table format
D.3.2. Array format
D.3.3. CArray format
D.3.4. EArray format
D.3.5. VLArray format
Bibliography

List of Figures

1.1. An HDF5 example with 2 subgroups, 2 tables and 1 array.
1.2. A PyTables object tree example.
3.1. The initial version of the data file for tutorial 1, with a view of the data objects.
3.2. The final version of the data file for tutorial 1.
3.3. General properties of the /detector/readout table.
3.4. Table hierarchy for tutorial 2.
5.1. Times for different selection modes over Int32 values. Benchmark made on a machine with Itanium (IA64) @ 900 MHz processors with SCSI disk @ 10K RPM.
5.2. Times for different selection modes over Float64 values. Benchmark made on a machine with Itanium (IA64) @ 900 MHz processors with SCSI disk @ 10K RPM.
5.3. Times for indexing a couple of columns of data type Int32 and Float64. Benchmark made on a machine with Itanium (IA64) @ 900 MHz processors with SCSI disk @ 10K RPM.
5.4. Comparison between different compression libraries.
5.5. Comparison between different compression levels of Zlib.
5.6. Writing tables with several compressors.
5.7. Selecting values in tables with several compressors. The file is not in the OS cache.
5.8. Selecting values in tables with several compressors. The file is in the OS cache.
5.9. Writing in tables with different levels of compression.
5.10. Selecting values in tables with different levels of compression. The file is in the OS cache.
5.11. Comparison between different compression libraries with and without the shuffle filter.
5.12. Writing with different compression libraries with and without the shuffle filter.
5.13. Reading with different compression libraries with the shuffle filter. The file is not in OS cache.
5.14. Reading with different compression libraries with and without the shuffle filter. The file is in OS cache.
5.15. Writing tables with/without Psyco.
5.16. Reading tables with/without Psyco.
5.17. Complete tree in file test.h5, and subtree of interest for the user.
5.18. Resulting object tree derived from the use of the rootUEP parameter.

List of Tables

5.1. Retrieving speed and memory consumption dependency of the number of nodes in LRU cache.
A.1. Data types supported for array elements and tables columns in PyTables.