PyTables User's Guide

Hierarchical datasets in Python - Release 2.0.4

Francesc Altet

Ivan Vilata

Scott Prater

Vicent Mas

Tom Hedley

Antonio Valentino

Jeffrey Whitaker

Copyright Notice and Statement for PyTables User's Guide.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

c. Neither the name of the Carabos Coop. V. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

$LastChangedDate: 2008-07-04 15:20:53 +0200 (dv, 04 jul 2008) $


Table of Contents

I. The PyTables Core Library
1. Introduction
1.1. Main Features
1.2. The Object Tree
2. Installation
2.1. Installation from source
2.1.1. Prerequisites
2.1.2. PyTables package installation
2.2. Binary installation (Windows)
2.2.1. Windows prerequisites
2.2.2. PyTables package installation
3. Tutorials
3.1. Getting started
3.1.1. Importing tables objects
3.1.2. Declaring a Column Descriptor
3.1.3. Creating a PyTables file from scratch
3.1.4. Creating a new group
3.1.5. Creating a new table
3.1.6. Reading (and selecting) data in a table
3.1.7. Creating new array objects
3.1.8. Closing the file and looking at its content
3.2. Browsing the object tree
3.2.1. Traversing the object tree
3.2.2. Setting and getting user attributes
3.2.3. Getting object metadata
3.2.4. Reading data from Array objects
3.3. Commiting data to tables and arrays
3.3.1. Appending data to an existing table
3.3.2. Modifying data in tables
3.3.3. Modifying data in arrays
3.3.4. And finally... how to delete rows from a table
3.4. Multidimensional table cells and automatic sanity checks
3.4.1. Shape checking
3.4.2. Field name checking
3.4.3. Data type checking
3.5. Exercising the Undo/Redo feature
3.5.1. A basic example
3.5.2. A more complete example
3.6. Using enumerated types
3.6.1. Enumerated columns
3.6.2. Enumerated arrays
3.7. Dealing with nested structures in tables
3.7.1. Nested table creation
3.7.2. Reading nested tables
3.7.3. Using Cols accessor
3.7.4. Accessing meta-information of nested tables
3.8. Other examples in PyTables distribution
4. Library Reference
4.1. tables variables and functions
4.1.1. Global variables
4.1.2. Global functions
4.2. The File class
4.2.1. File instance variables
4.2.2. File methods — file handling
4.2.3. File methods — hierarchy manipulation
4.2.4. File methods — tree traversal
4.2.5. File methods — Undo/Redo support
4.2.6. File methods — atttribute handling
4.3. The Node class
4.3.1. Node instance variables — location dependent
4.3.2. Node instance variables — location independent
4.3.3. Node instance variables — attribute shorthands
4.3.4. Node methods — hierarchy manipulation
4.3.5. Node methods — attribute handling
4.4. The Group class
4.4.1. Group instance variables
4.4.2. Group methods
4.4.3. Group special methods
4.5. The Leaf class
4.5.1. Leaf instance variables
4.5.2. Leaf instance variables — aliases
4.5.3. Leaf methods
4.6. The Table class
4.6.1. Table instance variables
4.6.2. Table methods — reading
4.6.3. Table methods — writing
4.6.4. Table methods — querying
4.6.5. Table methods — other
4.6.6. The Description class
4.6.7. The Row class
4.6.8. The Cols class
4.6.9. The Column class
4.7. The Array class
4.7.1. Array instance variables
4.7.2. Array methods
4.7.3. Array special methods
4.8. The CArray class
4.8.1. Example of use
4.9. The EArray class
4.9.1. EArray methods
4.9.2. Example of use
4.10. The VLArray class
4.10.1. VLArray instance variables
4.10.2. VLArray methods
4.10.3. VLArray special methods
4.10.4. Example of use
4.11. The UnImplemented class
4.12. The AttributeSet class
4.12.1. Notes on native and pickled attributes
4.12.2. AttributeSet instance variables
4.12.3. AttributeSet methods
4.13. Declarative classes
4.13.1. The IsDescription class
4.13.2. The Col class and its descendants
4.13.3. The Atom class and its descendants.
4.14. Helper classes
4.14.1. The Filters class
4.14.2. The Index class
4.14.3. The Enum class
5. Optimization tips
5.1. Informing PyTables about expected number of rows in tables or arrays
5.2. Accelerating your searches
5.2.1. In-kernel searches
5.2.2. Indexed searches
5.3. Compression issues
5.4. Shuffling (or how to make the compression process more effective)
5.5. Using Psyco
5.6. Getting the most from the node LRU cache
5.7. Compacting your PyTables files
II. Complementary modules
6. filenode - simulating a filesystem with PyTables
6.1. What is filenode?
6.2. Finding a filenode node
6.3. filenode - simulating files inside PyTables
6.3.1. Creating a new file node
6.3.2. Using a file node
6.3.3. Opening an existing file node
6.3.4. Adding metadata to a file node
6.4. Complementary notes
6.5. Current limitations
6.6. filenode module reference
6.6.1. Global constants
6.6.2. Global functions
6.6.3. The FileNode abstract class
6.6.4. The ROFileNode class
6.6.5. The RAFileNode class
7. netcdf3 - a PyTables NetCDF3 emulation API
7.1. What is netcdf3?
7.2. Using the tables.netcdf3 package
7.2.1. Creating/Opening/Closing a tables.netcdf3 file
7.2.2. Dimensions in a tables.netcdf3 file
7.2.3. Variables in a tables.netcdf3 file
7.2.4. Attributes in a tables.netcdf3 file
7.2.5. Writing data to and retrieving data from a tables.netcdf3 variable
7.2.6. Efficient compression of tables.netcdf3 variables
7.3. tables.netcdf3 package reference
7.3.1. Global constants
7.3.2. The NetCDFFile class
7.3.3. The NetCDFVariable class
7.4. Converting between true netCDF files and tables.netcdf3 files
7.5. tables.netcdf3 file structure
7.6. Sharing data in tables.netcdf3 files over the internet with OPeNDAP
7.7. Differences between the Scientific.IO.NetCDF API and the tables.netcdf3 API
III. Appendixes
A. Supported data types in PyTables
B. Condition syntax
C. Using nested record arrays
C.1. Introduction
C.2. NestedRecArray methods
C.3. NestedRecord objects
D. Utilities
D.1. ptdump
D.1.1. Usage
D.1.2. A small tutorial on ptdump
D.2. ptrepack
D.2.1. Usage
D.2.2. A small tutorial on ptrepack
D.3. nctoh5
D.3.1. Usage
E. PyTables File Format
E.1. Mandatory attributes for a File
E.2. Mandatory attributes for a Group
E.3. Optional attributes for a Group
E.4. Mandatory attributes, storage layout and supported data types for Leaves
E.4.1. Table format
E.4.2. Array format
E.4.3. CArray format
E.4.4. EArray format
E.4.5. VLArray format
E.5. Optional attributes for Leaves
Bibliography

List of Figures

1.1. An HDF5 example with 2 subgroups, 2 tables and 1 array.
1.2. A PyTables object tree example.
3.1. The initial version of the data file for tutorial 1, with a view of the data objects.
3.2. The final version of the data file for tutorial 1.
3.3. General properties of the /detector/readout table.
3.4. Table hierarchy for tutorial 2.
5.1. Times for different sequential selection modes over Float64 values. Benchmark made on a machine with AMD Opteron (AMD64) @ 2 GHz processors with IDE disk @ 7200 RPM.
5.2. Times for indexing a Float64 column. Benchmark made on a machine with AMD Opteron (AMD64) @ 2 GHz processors with IDE disk @ 7200 RPM.
5.3. Times for querying a Float64 column with a cold cache (mean of 10 first queries). Benchmark made on a machine with AMD Opteron (AMD64) @ 2 GHz processors with IDE disk @ 7200 RPM.
5.4. Times for querying a Float64 column with a warm cache (mean of 500 queries). Benchmark made on a machine with AMD Opteron (AMD64) @ 2 GHz processors with IDE disk @ 7200 RPM.
5.5. Times for doing a query that is already in cache for a Float64 column. Benchmark made on a machine with AMD Opteron (AMD64) @ 2 GHz processors with IDE disk @ 7200 RPM.
5.6. Times for doing a query with different number of hits on a indexed table with one gigarow. Benchmark made on a machine with AMD Opteron (AMD64) @ 2 GHz processors with IDE disk @ 7200 RPM.
5.7. Comparison between different compression libraries.
5.8. Comparison between different compression levels of Zlib.
5.9. Writing tables with several compressors.
5.10. Selecting values in tables with several compressors. The file is not in the OS cache.
5.11. Selecting values in tables with several compressors. The file is in the OS cache.
5.12. Writing in tables with different levels of compression.
5.13. Selecting values in tables with different levels of compression. The file is in the OS cache.
5.14. Comparison between different compression libraries with and without the shuffle filter.
5.15. Writing with different compression libraries with and without the shuffle filter.
5.16. Reading with different compression libraries with the shuffle filter. The file is not in OS cache.
5.17. Reading with different compression libraries with and without the shuffle filter. The file is in OS cache.
5.18. Writing tables with/without Psyco.
5.19. Reading tables with/without Psyco.

List of Tables

5.1. Retrieval speed and memory consumption depending on the number of nodes in LRU cache.
A.1. Data types supported for array elements and tables columns in PyTables.