PyTables
implements
several classes to represent the different nodes in the object tree. They
are named File
,
Group
,
Leaf
,
Table
,
Array
,
CArray
,
EArray
,
VLArray
and
UnImplemented
. Another one
allows the user to complement the information on these different objects;
its name is AttributeSet
.
Finally, another important class called
IsDescription
allows to build
a Table
record description by
declaring a subclass of it. Many other classes are defined in
PyTables
, but they can be
regarded as helpers whose goal is mainly to declare the data
type properties of the different first class objects and will
be described at the end of this chapter as well.
An important function, called
openFile
is responsible to
create, open or append to files. In addition, a few utility functions are
defined to guess if the user supplied file is a
PyTables or HDF5 file. These are
called isPyTablesFile()
and
isHDF5File()
, respectively.
Finally, there exists a function called
whichLibVersion
that informs
about the versions of the underlying C libraries (for example, the
HDF5
or the
Zlib
).
Let's start discussing the first-level variables and functions
available to the user, then the different classes defined in
PyTables
.
The
PyTables
version
number.
The underlying HDF5 library version number.
An easy way of copying one PyTables file to another.
This function allows you to copy an existing PyTables file
named srcfilename
to
another file called
dstfilename
. The source
file must exist and be readable. The destination file can be
overwritten in place if existing by asserting the
overwrite
argument.
This function is a shorthand for the
File.copyFile()
method,
which acts on an already opened file.
kwargs
takes keyword
arguments used to customize the copying process. See the
documentation of
File.copyFile()
(see
description) for a description of
those arguments.
Determine whether a file is in the HDF5 format.
When successful, it returns a true value if the file is an
HDF5 file, false otherwise. If there were problems identifying the
file, an HDF5ExtError
is raised.
Determine whether a file is in the PyTables format.
When successful, it returns a true value if the file is a
PyTables file, false otherwise. The true value is the format version
string of the file. If there were problems identifying the file, an
HDF5ExtError
is raised.
Open a PyTables
(or generic HDF5
) file
and returns a File
object.
The
name of the file (supports environment variable expansion). It is
suggested that it should have any of
".h5"
,
".hdf"
or
".hdf5"
extensions,
although this is not mandatory.
The mode to open the file. It can be one of the following:
read-only; no data can be modified.
write; a new file is created (an existing file with the same name would be deleted).
append; an existing file is opened for reading and writing, and if the file does not exist it is created.
is similar to 'a', but the file must already exist.
If filename is new, this will set a title for the root group in this file. If filename is not new, the title will be read from disk, and this will not have any effect.
A dictionary to map names in the object tree Python namespace into different HDF5 names in file namespace. The keys are the Python names, while the values are the HDF5 names. This is useful when you need to use HDF5 node names with invalid or reserved words in Python.
The root User
Entry Point. This is a group in the HDF5 hierarchy which will be
taken as the starting point to create the object tree. The group
has to be named after its HDF5 name and can be a path. If it does
not exist, an
HDF5ExtError
exception is issued. Use this if you do not want to build the
entire object tree, but rather only a
subtree of it.
An instance of
the Filters
class
(see Section 4.17.1) that
provides information about the desired I/O filters applicable to
the leaves that hang directly from root
(unless other filters properties are specified for these leaves).
Besides, if you do not specify filter properties for its child
groups, they will inherit these ones. So, if you open a new file
with this parameter set, all the leaves that would be created in
the file will recursively inherit this filtering properties
(again, if you don't prevent that from happening by specifying
other filters on the child groups or leaves).
The number of unreferenced nodes to be kept in memory. Least recently used nodes are unloaded from memory when this number of loaded nodes is reached. To load a node again, simply access it as usual. Nodes referenced by user variables are not taken into account nor unloaded.
Get version information about a C library.
If the library indicated by
name
is available, this
function returns a 3-tuple containing the major library version as
an integer, its full version as a string, and the version date as a
string. If the library is not available,
None
is returned.
The currently supported library names are
hdf5
,
zlib
,
lzo
,
ucl
(in process of
being deprecated) and
bzip2
. If another name
is given, a ValueError
is raised.
An instance of this class is returned when a PyTables file is
opened with the openFile()
function. It offers methods to manipulate (create, rename, delete...)
nodes and handle their attributes, as well as methods to traverse the
object tree. The user entry point to the object
tree attached to the HDF5 file is represented in the
rootUEP
attribute. Other
attributes are available.
File
objects support
an Undo/Redo mechanism which can be enabled with
the enableUndo()
method.
Once the Undo/Redo mechanism is enabled, explicit
marks (with an optional unique name) can be set on
the state of the database using the
mark()
method. There are
two implicit marks which are always available: the initial mark (0) and
the final mark (-1). Both the identifier of a mark and its name can be
used in undo and redo
operations.
Hierarchy manipulation operations (node creation, movement and
removal) and attribute handling operations (setting and deleting) made
after a mark can be undone by using the
undo()
method, which
returns the database to the state of a past mark. If
undo()
is not followed by
operations that modify the hierarchy or attributes, the
redo()
method can be used
to return the database to the state of a future mark. Else, future
states of the database are forgotten.
Note that data handling operations can not be undone nor redone by
now. Also, hierarchy manipulation operations on nodes that do not
support the Undo/Redo mechanism issue an
UndoRedoWarning
before changing the database.
The Undo/Redo mechanism is persistent between sessions and can
only be disabled by calling the
disableUndo()
method.
The name of the opened file.
The PyTables version number of this file.
True if the underlying file is open, false otherwise.
The mode in which the file was opened.
The title of the root group in the file.
A dictionary that
maps node names between PyTables and HDF5 domain
names. Its initial values are set from the
trMap
parameter passed to the
openFile
function. You cannot change its contents
after a file is opened.
The UEP (user entry point) group in the file (see description).
Default filter properties for the root group (see 4.17.1).
The
root of the object tree hierarchy (a
Group
instance).
A dictionary which maps path names to objects, for every visible node in the tree (deprecated, see note below).
A dictionary which maps path names to objects, for every visible group in the tree (deprecated, see note below).
A dictionary which maps path names to objects, for every visible leaf in the tree (deprecated, see note below).
Note: From PyTables 1.2 on,
the dictionaries objects
, groups
and leaves
are just instances of objects
faking the old functionality. Actually, they internally
use File.getNode()
(see description)
and File.walknodes()
(see description),
which are recommended instead.
Create a new Group instance with name name in where location.
The parent group where the new group will hang from.
where parameter can be a path string (for example
"/level1/group5"
), or
another Group instance.
The name of the new group.
A description for this group.
An instance of the
Filters
class
(see Section 4.17.1) that provides
information about the desired I/O filters applicable to the leaves that
hangs directly from this new group (unless other filters properties are
specified for these leaves). Besides, if you do not specify filter
properties for its child groups, they will inherit these ones.
Whether to create the needed groups for the parent path to exist (not done by default).
Create a new Table
instance with name
name in where location. See the Section 4.6 for a description
of the Table
class.
The parent group where the new
table will hang from. where parameter can be
a path string (for example
"/level1/leaf5"
), or Group instance.
The name of the new table.
This is an object that describes the table, that is, how many columns has it, and properties for each column: the type, the shape, etc. as well as other table properties.
description can be any of the next several objects:
This should
inherit from the IsDescription
class
(see 4.16.1)
where table fields are specified.
For example, when you do not know beforehand which structure will have your table). See Section 3.4 for an example of use.
RecArray
This
object from the numarray
package is
also accepted, and all the information about
columns and other metadata is used as a basis to
create the Table
object. Moreover, if
the RecArray
has actual data this is
also injected on the newly created
Table
object.
NestedRecArray
Finally, if you want to have nested columns
in your table, you can use this object (see Appendix B)
and all the information about columns and other
metadata is used as a basis to create the
Table
object. Moreover, if the
NestedRecArray
has actual data this
is also injected on the newly created
Table
object.
A description for this object.
An instance of the
Filters
class (see Section 4.17.1) that provides
information about the desired I/O filters to be
applied during the life of this object.
An user estimate of the number of records that will be on table. If not provided, the default value is appropriate for tables until 10 MB in size (more or less). If you plan to save bigger tables you should provide a guess; this will optimize the HDF5 B-Tree creation and management process time and memory used. See Section 5.1 for a discussion on that issue.
Whether to create the needed groups for the parent path to exist (not done by default).
Create a new Array
instance with name
name in where location. See the Section 4.10 for a
description of the Array
class.
The regular array to be
saved. Currently accepted values are:
NumPy
, Numeric
,
numarray
arrays (including
CharArray
string numarrays) or other
native Python types, provided that they are regular
(i.e. they are not like [[1,2],2]
) and
homogeneous (i.e. all the elements are of the same
type). Also, objects that have some of their
dimensions equal to zero are not supported (use an
EArray
object if you want to create an
array with one of its dimensions equal to 0).
Whether to create the needed groups for the parent path to exist (not done by default).
See createTable
description for more information on the
where, name and title,
parameters.
Create a new CArray
instance with name
name in where location. See the Section 4.11 for a
description of the CArray
class.
The shape of the objects to be saved.
An Atom
instance representing the shape,
type and flavor of the chunk
of the objects to be saved.
Whether to create the needed groups for the parent path to exist (not done by default).
See createTable
description for more information on the
where, name and title,
parameters.
Create a new EArray
instance with name
name in where location. See the Section 4.12 for a
description of the EArray
class.
An
Atom
instance
representing the shape, type
and flavor of the atomic objects to be saved. One
(and only one) of the shape dimensions must be 0.
The dimension being 0 means that the resulting
EArray
object can be
extended along it. Multiple enlargeable dimensions are not supported
right now. See Section 4.16.3 for the supported set of
Atom
class descendants.
In the case of enlargeable arrays this represents an user estimate about the number of row elements that will be added to the growable dimension in the EArray object. If not provided, the default value is 1000 rows. If you plan to create both much smaller or much bigger EArrays try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.
Whether to create the needed groups for the parent path to exist (not done by default).
See createTable
description for more information on the
where, name, title,
and filters parameters.
Create a new VLArray
instance with name
name in where location. See the Section 4.13 for a
description of the VLArray
class.
An Atom
instance representing the shape, type and flavor of the atomic object to
be saved. See Section 4.16.3
for the supported set of
Atom
class descendants.
An user estimate about the size (in MB) in the final
VLArray
object. If not
provided, the default value is 1 MB. If you plan to create both much
smaller or much bigger VLA's try providing a guess; this will optimize
the HDF5 B-Tree creation and management process time and the amount of
memory used.
Whether to create the needed groups for the parent path to exist (not done by default).
See createTable
description for more information on the
where, name, title, and
filters parameters.
Get the node under where with the given name.
where can be a Node
instance
or a path string leading to a node.
If no name is specified, that node is returned.
If a name is specified, this must be a string
with the name of a node under where.
In this case the where argument can only lead to
a Group
instance
(else a TypeError
is raised).
The node called name under the group where
is returned.
In both cases, if the node to be returned does not exist, a
NoSuchNodeError
is raised.
Please, note that hidden nodes are also considered.
If the classname argument is specified,
it must be the name of a class derived from Node
.
If the node is found but it is not an instance of that class,
a NoSuchNodeError
is also raised.
Is the node under path
visible?
If the node does not exist,
a NoSuchNodeError
is raised.
Returns the attribute attrname under where.name location.
These arguments work as in
getNode()
(see description), referencing
the node to be acted upon.
The name of the attribute to get.
Sets the attribute attrname with value
attrvalue under where.name location.
If the node already has a large number of attributes,
a PerformanceWarning
will be issued.
These arguments work as in
getNode()
(see description), referencing the node to be acted
upon.
The name of the attribute to set on disk.
The value of the attribute to set. Any kind of python
object (like string, ints, floats, lists, tuples, dicts, small
Numeric/NumPy/numarray objects...) can be stored as an attribute.
However, if necessary,
(c)Pickle
is automatically
used so as to serialize objects that you might want to save
(see 4.15 for
details).
Delete the attribute attrname in where.name location.
These arguments work as in
getNode()
(see description), referencing
the node to be acted upon.
The name of the attribute to delete on disk.
Copy the attributes from node where.name to dstnode.
These arguments work as in
getNode()
(see description), referencing
the node to be acted upon.
This is the destination node where the attributes will
be copied. It can be either a path string or a
Node
object.
Returns an iterator yielding children nodes hanging from where. These nodes are alpha-numerically sorted by its node name.
This argument works as in
getNode()
(see description), referencing
the node to be acted upon.
If the name of a class derived from
Node
is supplied in the
classname parameter, only instances of that class
(or subclasses of it) will be returned.
Returns a list with children nodes hanging from where. The list is alpha-numerically sorted by node name.
This argument works as in
getNode()
(see description), referencing
the node to be acted upon.
If the name of a class derived from
Node
is supplied in the
classname parameter, only instances of that class
(or subclasses of it) will be returned.
Removes the object node name under where location.
These arguments work as in
getNode()
(see description), referencing
the node to be acted upon.
If not supplied, the object will be removed only if it
has no children; if it does, a
NodeError
will be raised.
If supplied with a true value, the object and all its descendants will
be completely removed.
Copy the node specified by where and name to newparent/newname.
These arguments work as in
getNode()
(see description), referencing
the node to be acted upon.
The destination group that the node will be copied to (a
path name or a Group
instance). If newparent is
None
, the parent of the
source node is selected as the new parent.
The name to be assigned to the new copy in its
destination (a string). If newname is
None
or not specified, the
name of the source node is used.
Whether the possibly existing node
newparent/newname should be overwritten or not.
Note that trying to copy over an existing node without overwriting it
will issue a NodeError
.
Specifies whether the copy should recurse into children of the copied node. This argument is ignored for leaf nodes. The default is not recurse.
Whether to create the needed groups for the new parent path to exist (not done by default).
Additional keyword arguments may be passed to customize the copying process. The supported arguments depend on the kind of node being copied. The following are some of them:
The new title for the destination. If
None
, the original title is
used. This only applies to the topmost node for recursive copies.
Specifying this parameter overrides the original filter
properties in the source node. If specified, it must be an instance of
the Filters
class (see
Section 4.17.1). The
default is to copy the filter attribute from the source node.
You can prevent the user attributes from being copied by
setting this parameter to
False
. The default is to
copy them.
Specify the range of rows in child leaves to be copied; the default is to copy all the rows.
This argument may be used to collect statistics on the
copy process. When used, it should be a dictionary with keys
groups
,
leaves
and
bytes
having a numeric
value. Their values will be incremented to reflect the number of groups,
leaves and bytes, respectively, that have been copied in the operation.
Change the name of the node specified by where and name to newname.
These arguments work as in getNode()
(see description),
referencing the node to be acted upon.
The new name to be assigned to the node (a string).
Move the node specified by where and name to newparent/newname.
These arguments work as in getNode()
(see description),
referencing the node to be acted upon.
The destination group the node will be moved to
(a path name or a Group
instance).
If newparent is None
,
the original node parent is selected as the new parent.
The new name to be assigned to the node in its destination (a string).
If newname is None
or not specified,
the original node name is used.
The other arguments work as in Node._f_move()
(see description).
Iterator that returns the list of Groups (not Leaves) hanging from (and including) where. The where Group is listed first (pre-order), then each of its child Groups (following an alpha-numerical order) is also traversed, following the same procedure. If where is not supplied, the root object is used.
The origin group. Can be a
path string or Group
instance.
Recursively iterate over the nodes in the
File
instance. It takes two parameters:
If supplied, the iteration starts from (and includes) this group.
(String) If supplied, only instances of this class are returned.
Example of use:
# Recursively print all the nodes hanging from '/detector' print "Nodes hanging from group '/detector':" for node in h5file.walkNodes("/detector"): print node
Copy the children of a group into another group.
This method copies the nodes hanging from the source group srcgroup
into the destination group dstgroup
.
Existing destination nodes can be replaced by asserting
the overwrite
argument.
If the recursive
argument is true,
all descendant nodes of srcnode
are recursively copied.
If createparents
is true,
the needed groups for the given destination parent group path
to exist will be created.
kwargs
takes keyword arguments used to customize
the copying process.
See the documentation of Group._f_copyChildren()
(see description)
for a description of those arguments.
Copy the contents of this file to dstfilename
.
dstfilename
must be a path string
indicating the name of the destination file. If it
already exists, the copy will fail with an
IOError
, unless the overwrite
argument is true, in which case the destination file
will be overwritten in place. In this last case, the
destination file should be closed or ugly errors will
happen.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.
Copying a file usually has the beneficial side effect of creating a more compact and cleaner version of the original file.
Is the Undo/Redo mechanism enabled?
Returns
True
if the Undo/Redo
mechanism has been enabled for this file,
False
otherwise. Please,
note that this mechanism is persistent, so a newly opened PyTables
file may already have Undo/Redo support.
Enable the Undo/Redo mechanism.
This operation
prepares the database for undoing and redoing modifications in the
node hierarchy. This allows
mark()
,
undo()
,
redo()
and other methods
to be called.
The
filters
argument, when
specified, must be an instance of class
Filters
(see Section 4.17.1) and is meant
for setting the compression values for the action log. The default is
having compression enabled, as the gains in terms of space can be
considerable. You may want to disable compression if you want maximum
speed for Undo/Redo operations.
Calling
enableUndo()
when the
Undo/Redo mechanism is already enabled raises an
UndoRedoError
.
Disable the Undo/Redo mechanism.
Disabling the
Undo/Redo mechanism leaves the database in the current state and
forgets past and future database states. This makes
mark()
,
undo()
,
redo()
and other methods
fail with an
UndoRedoError
.
Calling
disableUndo()
when the
Undo/Redo mechanism is already disabled raises an
UndoRedoError
.
Mark the state of the database.
Creates a mark for
the current state of the database. A unique (and immutable) identifier
for the mark is returned. An optional
name
(a string) can be
assigned to the mark. Both the identifier of a mark and its name can
be used in undo()
and
redo()
operations. When
the name
has already been
used for another mark, an
UndoRedoError
is raised.
This method can only be called when the Undo/Redo
mechanism has been enabled. Otherwise, an
UndoRedoError
is raised.
Get the identifier of the current mark.
Returns the
identifier of the current mark. This can be used to know the state of
a database after an application crash, or to get the identifier of the
initial implicit mark after a call to
enableUndo()
.
This method can only be called when the Undo/Redo
mechanism has been enabled. Otherwise, an
UndoRedoError
is raised.
Go to a past state of the database.
Returns the
database to the state associated with the specified
mark
. Both the identifier
of a mark and its name can be used. If the
mark
is omitted, the last
created mark is used. If there are no past marks, or the specified
mark
is not older than
the current one, an
UndoRedoError
is raised.
This method can only be called when the Undo/Redo
mechanism has been enabled. Otherwise, an
UndoRedoError
is
raised.
Go to a future state of the database.
Returns the
database to the state associated with the specified
mark
. Both the identifier
of a mark and its name can be used. If the
mark
is omitted, the next
created mark is used. If there are no future marks, or the specified
mark
is not newer than
the current one, an
UndoRedoError
is raised.
This method can only be called when the Undo/Redo
mechanism has been enabled. Otherwise, an
UndoRedoError
is
raised.
Following are described the methods that automatically
trigger actions when a File
instance is
accessed in a special way.
Is there a node with that path?
Returns True
if the file has a node
with the given path (a string),
False
otherwise.
Iterate over the children on the File
instance. However, this does not accept parameters. This
iterator is recursive.
Example of use:
# Recursively list all the nodes in the object tree h5file = tables.openFile("vlarray1.h5") print "All nodes in the object tree:" for node in h5file: print node
Prints a short description of the File
object.
Example of use:
>>> f=tables.openFile("data/test.h5") >>> print f data/test.h5 (File) 'Table Benchmark' Last modif.: 'Mon Sep 20 12:40:47 2004' Object Tree: / (Group) 'Table Benchmark' /tuple0 (Table(100L,)) 'This is the table title' /group0 (Group) '' /group0/tuple1 (Table(100L,)) 'This is the table title' /group0/group1 (Group) '' /group0/group1/tuple2 (Table(100L,)) 'This is the table title' /group0/group1/group2 (Group) ''
This is the base class for all nodes in a PyTables hierarchy. It is an abstract class, i.e. it may not be directly instantiated; however, every node in the hierarchy is an instance of this class.
A PyTables node is always hosted in a PyTables file, under a parent group, at a certain depth in the node hierarchy. A node knows its own name in the parent group and its own path name in the file. When using a translation map (see 4.2), its HDF5 name might differ from its PyTables name.
All the previous information is location-dependent, i.e. it may change when moving or renaming a node in the hierarchy. A node also has location-independent information, such as its HDF5 object identifier and its attribute set.
This class gathers the operations and attributes (both location-dependent and independent) which are common to all PyTables nodes, whatever their type is. Nonetheless, due to natural naming restrictions, the names of all of these members start with a reserved prefix (see 4.4).
Sub-classes with no children (i.e. leaf nodes) may define
new methods, attributes and properties to avoid natural naming restrictions.
For instance, _v_attrs
may be shortened to attrs
and _f_rename
to rename
.
However, the original methods and attributes should still be available.
The hosting
File
instance
(see 4.2).
The parent
Group
instance
(see 4.4).
The depth of this node in the tree (an non-negative integer value).
The name of this node in its parent group (a string).
The name of this node in the hosting HDF5 file (a string).
The path of this node in the tree (a string).
The root group instance. This is deprecated; please use
node._v_file.root
.
The identifier of this node in the hosting HDF5 file.
The associated
AttributeSet
instance (see
4.15
).
Close this node in the tree.
This releases all resources held by the node, so it should not be used again. On nodes with data, it may be flushed to disk.
The closing operation is not recursive, i.e. closing a group does not close its children.
Remove this node from the hierarchy.
If the node has
children, recursive removal must be stated by giving
recursive
a true value;
otherwise, a NodeError
will
be raised.
Move or rename this node.
Moves a node into a new
parent group, or changes the name of the node.
newparent
can be a
Group
object or a pathname
in string form. If it is not specified or
None
, the current parent
group is chosen as the new parent.
newname
must be a string
with a new name. If it is not specified or
None
, the current name is
chosen as the new name. If createparents
is true,
the needed groups for the given new parent group path to exist will be created.
Moving a node across databases is
not allowed, nor it is moving a node into itself.
These result in a
NodeError
. However, moving
a node over itself is allowed and simply does
nothing. Moving over another existing node is similarly not allowed,
unless the optional
overwrite
argument is true,
in which case that node is recursively removed before moving.
Usually, only the first argument will be used, effectively moving the node to a new location without changing its name. Using only the second argument is equivalent to renaming the node in place.
Copy this node and return the new node.
Creates and returns a copy of the node,
maybe in a different place in the hierarchy.
newparent
can be a
Group
object or a pathname
in string form. If it is not specified or
None
, the current parent
group is chosen as the new parent.
newname
must be a string
with a new name. If it is not specified or
None
, the current name is
chosen as the new name. If
recursive
copy is stated,
all descendants are copied as well.
If ucreateparents
is true,
the needed groups for the given new parent group path to exist will be created.
Copying a
node across databases is supported but can not be undone. Copying a node
over itself is not allowed, nor it is recursively copying a node into
itself. These result in a
NodeError
. Copying over
another existing node is similarly not allowed, unless the optional
overwrite
argument is true,
in which case that node is recursively removed before copying.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. See the documentation for the particular node type.
Using only the first argument is equivalent to copying the node to a new location without changing its name. Using only the second argument is equivalent to making a copy of the node in the same group.
Get a PyTables attribute from this node.
If the named attribute does not exist, an
AttributeError
is raised.
Set a PyTables attribute for this node.
If the node already has a large number of attributes, a
PerformanceWarning
is
issued.
Instances of this class are a grouping structure containing instances of zero or more groups or leaves, together with supporting metadata.
Working with groups and leaves is similar in many ways to
working with directories and files, respectively, in a Unix
filesystem. As with Unix directories and files, objects in
the object tree are often described by giving their full (or
absolute) path names. This full path can be specified either
as a string (like in '/group1/group2'
) or as a
complete object path written in natural name schema
(like in file.root.group1.group2
) as
discussed in the Section 1.2.
A collateral effect of the natural naming schema
is that names of Group
members must be carefully chosen
to avoid colliding with existing children node names.
For this reason and not to pollute the children namespace,
it is explicitly forbidden to assign normal attributes
to Group instances, and all existing members start with some reserved prefixes,
like _f_
(for methods) or _v_
(for instance variables).
Any attempt to set a new child node whose name starts with one of these prefixes
will raise a ValueError
exception.
Another effect of natural naming is that
nodes having reserved Python names and other non-allowed Python names
(like for example $a
or 44
)
can not be accessed using the node.child
syntax.
You will be forced to use getattr(node, child)
and delattr(node, child)
to access them.
You can also make use of the
trMap
(translation map dictionary) parameter in the
openFile
function (see description) in order to
translate HDF5 names not suited for natural naming into more
convenient ones.
These instance variables are provided in addition to those
in Node
(see 4.3).
The number of children hanging from this group.
Dictionary with all nodes hanging from this group.
Dictionary with all groups hanging from this group.
Dictionary with all leaves hanging from this group.
Default filter properties for child nodes —see
4.17.1.
A shorthand for FILTERS
attribute.
This class defines the __setattr__
,
__getattr__
and __delattr__
methods,
and they set, get and delete ordinary Python attributes
as normally intended.
In addition to that, __getattr__
allows getting
child nodes by their name for the sake of
easy interaction on the command line,
as long as there is no Python attribute with the same name.
Groups also allow the interactive completion
(when using readline
) of the names of child nodes.
For instance:
nchild = group._v_nchildren # get a Python attribute # Add a Table child called "table" under "group". h5file.createTable(group, 'table', myDescription) table = group.table # get the table child instance group.table = 'foo' # set a Python attribute # (PyTables warns you here about using the name of a child node.) foo = group.table # get a Python attribute del group.table # delete a Python attribute table = group.table # get the table child instance again
Caveat: The following
methods are documented for completeness, and they can be
used without any problem. However, you should use the
high-level counterpart methods in the File
class, because these are most used in documentation and
examples, and are a bit more powerful than those exposed
here.
These methods are provided in addition to those
in Node
(see 4.3).
Get the child called childname
of this group.
If the child exists (be it visible or not), it is returned.
Else, a NoSuchNodeError
is raised.
Copy this node and return the new one.
This method has the behavior described in Node._f_copy()
(see description).
In addition, it recognizes the following keyword arguments:
The new title for the destination.
If omitted or None
, the original title is used.
This only applies to the topmost node in recursive copies.
Specifying this parameter overrides the original
filter properties in the source node.
If specified, it must be an instance of the Filters
class (see Section 4.17.1).
The default is to copy the filter properties
from the source node.
You can prevent the user attributes from being copied
by setting this parameter to False
.
The default is to copy them.
This argument may be used to collect statistics
on the copy process. When used, it should be a dictionary
with keys 'groups'
, 'leaves'
and 'bytes'
having a numeric value.
Their values will be incremented to reflect the number of
groups, leaves and bytes, respectively,
that have been copied during the operation.
Returns an iterator yielding all the object nodes hanging from this instance. The nodes are alpha-numerically sorted by its node name. If a classname parameter is supplied, it will only return instances of this class (or subclasses of it).
Returns a list with all the object nodes hanging from this instance. The list is alpha-numerically sorted by node name. If a classname parameter is supplied, it will only return instances of this class (or subclasses of it).
Iterate over the list of Groups (not Leaves) hanging from (and including) self. This Group is listed first (pre-order), then each of its child Groups (following an alpha-numerical order) is also traversed, following the same procedure.
Iterate over the nodes in the Group
instance. It takes two parameters:
(String) If supplied, only instances of this class are returned.
(Integer) If false, only children hanging immediately after the group are returned. If true, a recursion over all the groups hanging from it is performed.
Example of use:
# Recursively print all the arrays hanging from '/' print "Arrays the object tree '/':" for array in h5file.root._f_walkNodes("Array", recursive=1): print array
Close this node in the tree.
This method has the behavior described in Node._f_close()
(see description).
It should be noted that this operation disables access
to nodes descending from this group.
Therefore, if you want to explicitly close them,
you will need to walk the nodes hanging from this group
before closing it.
Copy the children of this group into another group.
Children hanging directly from this group are copied into dstgroup
,
which can be a Group
(see 4.4)
object or its pathname in string form.
If createparents
is true,
the needed groups for the given destination group path to exist will be created.
The operation will fail with a NodeError
if there is a child node in the destination group
with the same name as one of the copied children from this one,
unless overwrite
is true;
in this case, the former child node is recursively removed
before copying the later.
By default, nodes descending from children groups of this node
are not copied. If the recursive
argument is true,
all descendant nodes of this node are recursively copied.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.
Following are described the methods that automatically
trigger actions when a Group
instance is
accessed in a special way.
Set a Python attribute called name
with the given value
.
This method stores an ordinary Python attribute in the object.
It does not store new children nodes under this group;
for that, use the File.create*()
methods
(see 4.2).
It does neither store a PyTables node attribute;
for that, use File.setNodeAttr()
(see description),
Node._f_setAttr()
(see description)
or Node._v_attrs
(see _v_attrs).
If there is already a child node with the same name
,
a NaturalNameWarning
will be issued
and the child node will not be accessible via natural naming
nor getattr()
.
It will still be available via File.getNode()
(see description),
Group._f_getChild()
(see description)
and children dictionaries in the group (if visible).
Get a Python attribute or child node called name
.
If the object has a Python attribute called name
,
its value is returned.
Else, if the node has a child node called name
,
it is returned.
Else, an AttributeError
is raised.
Delete a Python attribute called name
.
This method deletes an ordinary Python attribute
from the object.
It does not remove children nodes from this group;
for that, use File.removeNode()
(see description)
or Node._f_remove()
(see description).
It does neither delete a PyTables node attribute;
for that, use File.delNodeAttr()
(see description),
Node._f_delAttr()
(see description)
or Node._v_attrs
(see _v_attrs).
If there were an attribute and a child node
with the same name
,
the child node will be made accessible again via natural naming.
Is there a child with that name?
Returns True
if the group has a child node
(visible or hidden)
with the given name (a string),
False
otherwise.
Iterate over the children on the group instance. However, this does not accept parameters. This iterator is not recursive.
Example of use:
# Non-recursively list all the nodes hanging from '/detector' print "Nodes in '/detector' group:" for node in h5file.root.detector: print node
Prints a short description of the Group
object.
Example of use:
>>> f=tables.openFile("data/test.h5") >>> print f.root.group0 /group0 (Group) 'First Group' >>>
The goal of this class is to provide a place to put common
functionality of all its descendants as well as provide a
way to help classifying objects on the tree. A
Leaf
object is an end-node, that is, a node
that can hang directly from a group object, but that is not
a group itself and, thus, it can not have descendants. Right
now, the set of end-nodes is composed by Table
,
Array
, CArray
, EArray
,
VLArray
and UnImplemented
class
instances. In fact, all the previous classes inherit from
the Leaf
class.
These instance variables are provided in addition to those
in Node
(see 4.3).
The shape of data in the leaf.
The byte ordering of data in the leaf.
Filter properties for this leaf —see 4.17.1.
The name of this node in its parent group (a string). An
alias for Node._v_name
.
The name of this node in the hosting HDF5 file (a string).
An alias for
Node._v_hdf5name
.
The identifier of this node in the hosting HDF5 file. An
alias for Node._v_objectID
.
The associated
AttributeSet
instance (see
4.15).
An alias for Node._v_attrs
.
A description for this node. An alias for
Node._v_title
.
Flush pending data to disk.
Saves whatever remaining buffered data to disk. It also
releases I/O buffers, so, if you are filling many
objects (i.e. tables) in the same PyTables session,
please, call flush()
extensively so as to
help PyTables to keep memory requirements low.
Close this node in the tree.
This method has the behavior described in Node._f_close()
(see description).
Besides that, the optional argument flush
tells
whether to flush pending data to disk or not before closing.
Remove this node from the hierarchy.
This method
has the behavior described in
Node._f_remove()
(see description). Please, note that
there is no recursive
flag since leaves do
not have child nodes.
Copy this node and return the new one.
This method has the behavior described in
Node._f_copy()
(see description). Please, note that
there is no recursive
flag since leaves do
not have child nodes. In addition, this method
recognizes the following keyword arguments:
The new title for the destination.
If omitted or None
, the original title is used.
Specifying this parameter overrides the original
filter properties in the source node.
If specified, it must be an instance of the Filters
class (see Section 4.17.1).
The default is to copy the filter properties
from the source node.
You can prevent the user attributes from being copied
by setting this parameter to False
.
The default is to copy them.
Specify the range of rows in child leaves to be copied; the default is to copy all the rows.
This argument may be used to collect statistics
on the copy process. When used, it should be a dictionary
with keys 'groups'
, 'leaves'
and 'bytes'
having a numeric value.
Their values will be incremented to reflect the number of
groups, leaves and bytes, respectively,
that have been copied during the operation.
Rename this node in place.
This method has the behavior described in Node._f_rename()
(see description).
Move or rename this node.
This method has the behavior described in Node._f_move()
(see description).
Is this node visible?
This method has the behavior described in Node._f_isVisible()
(see description).
Get a PyTables attribute from this node.
This method has the behavior described in
Node._f_getAttr()
(see description).
Set a PyTables attribute for this node.
This
method has the behavior described in
Node._f_setAttr()
(see description).
Delete a PyTables attribute from this node.
This method has the behavior described in
Node._f_delAttr()
(see description).
Instances of this class represents table objects in the object tree. It provides methods to read/write data and from/to table objects in the file.
Data can be read from or written to tables by accessing to
an special object that hangs from Table
. This
object is an instance of the Row
class (see
4.6.4). See the tutorial
sections Chapter 3 on how to use the
Row
interface. The columns of the tables can
also be easily accessed (and more specifically, they can be
read but not written) by making use of the
Column
class, through the use of an
extension of the natural naming schema applied
inside the tables. See the Section 4.9 for some examples of
use of this capability.
Note that this object inherits all the public attributes
and methods that Leaf
already has.
Finally, during the description of the different methods,
there will appear references to a particular object called
NestedRecArray
. This inherits from
numarray.records.RecArray
and is designed to
keep columns that have nested datatypes. Please, see Appendix B for info on
these objects.
A
Description
(see 4.8)
instance describing the structure of this table.
The associated
Row
instance
(see 4.6.4).
The number of rows in this table.
The size in bytes of each row in the table.
A Cols
(see
Section 4.7) instance
that serves as an accessor to
Column
(see Section 4.9) objects.
A tuple containing the (possibly nested) names of the columns in the table.
Maps the name of a column to its datatype.
Maps the name of a column to its data string type.
Maps the name of a column to it shape.
Maps the name of a column to the size of its base items.
Maps the name of a column to its default.
Is the column which name is used as a key indexed? (dictionary)
Does this table have any indexed columns?
Index properties for this table (an
IndexProps
instance, see
4.17.2).
The default flavor for this table. This determines the
type of objects returned during input (i.e. read) operations. It can
take the "numarray" (default) or
"numpy" values. Its value is derived from the
_v_flavor
attribute of the
IsDescription
metaclass
(see 4.16.1) or, if the table has been created directly from a
numarray
or
NumPy
object, the flavor is
set to the appropriate value.
Get the enumerated type associated with the named column.
If the column named colname
(a string)
exists and is of an enumerated type, the corresponding
Enum
instance (see 4.17.4) is returned. If it is
not of an enumerated type, a TypeError
is
raised. If the column does not exist, a
KeyError
is raised.
Append a series of rows to this Table
instance. rows is an object that can keep the
rows to be append in several formats, like a
NestedRecArray
(see Appendix B), a
RecArray
, a NumPy
object, a
list of tuples, list of
Numeric
/numarray
/NumPy
objects, string, Python buffer or None
(no append will result). Of course, this rows
object has to be compliant with the underlying format
of the Table
instance or a
ValueError
will be issued.
Example of use:
from tables import * class Particle(IsDescription): name = StringCol(16, pos=1) # 16-character String lati = IntCol(pos=2) # integer longi = IntCol(pos=3) # integer pressure = Float32Col(pos=4) # float (single-precision) temperature = FloatCol(pos=5) # double (double-precision) fileh = openFile("test4.h5", mode = "w") table = fileh.createTable(fileh.root, 'table', Particle, "A table") # Append several rows in only one call table.append([("Particle: 10", 10, 0, 10*10, 10**2), ("Particle: 11", 11, -1, 11*11, 11**2), ("Particle: 12", 12, -2, 12*12, 12**2)]) fileh.close()
Get a column from the table.
If a column called name
exists in the
table, it is read and returned as a
numarray
object, or as a NumPy
object (whatever is more appropriate depending on the
flavor of the table). If it does not exist, a
KeyError
is raised.
Example of use:
narray = table.col('var2')
That statement is equivalent to:
narray = table.read(field='var2')
Here you can see how this method can be used as a
shorthand for the read()
(see description) method.
Returns an iterator yielding Row
(see Section 4.6.4) instances built
from rows in table. If a range is supplied (i.e. some of
the start, stop or step
parameters are passed), only the appropriate rows are
returned. Else, all the rows are returned. See also the
__iter__()
special method in Section 4.6.3 for a shorter
way to call this iterator.
The meaning of the start, stop and
step parameters is the same as in the
range()
python function, except that
negative values of step
are not
allowed. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start nor stop, then all the rows in the object are
selected.
Example of use:
result = [ row['var2'] for row in table.iterrows(step=5) if row['var1'] <= 20 ]
Note: This iterator can be nested (see example in description).
Iterate over a sequence of row coordinates.
Can be any object that
supports the __getitem__
special
method, like lists, tuples, Numeric/NumPy/numarray
objects, etc.
If true, means that sequence will be sorted out so that the I/O process would get better performance. If your sequence is already sorted or you don't want to sort it, put this parameter to 0. The default is to sort the sequence.
Note: This iterator can be nested (see example in description).
Returns the actual data in Table
. If
field is not supplied, it returns the data as a
NestedRecArray
(see Appendix B) object
table.
The meaning of the start, stop and
step parameters is the same as in the
range()
python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
The rest of the parameters are described next:
If specified, only the column
field is returned as an homogeneous
numarray
/NumPy
/Numeric
object, depending on the flavor. If this is
not supplied, all the fields are selected and a
NestedRecArray
(see Appendix B) or
NumPy
object is returned. Nested
fields can be specified in the field
parameter by using a '/'
character as a
separator between fields
(e.g. Info/value
).
Passing a flavor
parameter make an additional conversion to happen in
the default returned object. flavor can
have any of the next values: "numarray"
"numpy"
, "python"
or
"numeric"
(only if field has
been specified). If flavor is not
specified, then it will take the value of
self.flavor
.
Read a set of rows given their indexes into an in-memory object.
This method works much like the read()
method (see description), but it uses
a sequence (coords
) of row indexes to
select the wanted columns, instead of a column range.
It returns the selected rows in a
NestedRecArray
object (see Appendix B). If
flavor
is provided, an additional
conversion to an object of this flavor is made, just
as in read()
.
Modify a series of rows in the
[start:stop:step]
extended slice
range. If you pass None
to stop,
all the rows existing in rows will be used.
rows can be either a recarray or a structure that is able to be converted to any of them and compliant with the table format.
Returns the number of modified rows.
It raises an ValueError
in case the rows
parameter could not be converted to an object compliant
with table description.
It raises an IndexError
in case the
modification will exceed the length of the table.
Modify a series of rows in the
[start:stop:step]
extended slice
row range. If you pass None
to stop,
all the rows existing in column will be used.
column can be either a
NestedRecArray
(see Appendix B),
RecArray
, numarray
,
NumPy
object, list or tuple that is able
to be converted into a NestedRecArray
compliant with the specified colname column
of the table.
colname specifies the column name of the table to be modified.
Returns the number of modified rows.
It raises an ValueError
in case the
column parameter could not be converted into an
object compliant with column
description.
It raises an IndexError
in case the
modification will exceed the length of the table.
Modify a series of rows in the
[start:stop:step]
extended slice
row range. If you pass None
to stop,
all the rows existing in columns will be used.
columns can be either a
NestedRecArray
(see Appendix B),
RecArray
, a NumPy
object, a
list of arrays or list or tuples (the columns) that
are able to be converted to a
NestedRecArray
compliant with the
specified column names subset of the table
format.
names specifies the column names of the table to be modified.
Returns the number of modified rows.
It raises an ValueError
in case the
columns parameter could not be converted to an
object compliant with table description.
It raises an IndexError
in case the
modification will exceed the length of the table.
Removes a range of rows in the table. If only start is supplied, this row is to be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed. A step parameter is not supported, and it is not foreseen to implement it anytime soon.
Sets the starting row to be removed. It accepts negative values meaning that the count starts from the end. A value of 0 means the first row.
Sets the last row to be
removed to stop - 1, i.e. the end point is
omitted (in the Python range
tradition). It accepts, likewise start,
negative values. A special value of
None
(the default) means removing just
the row supplied in start.
Remove the index associated with the specified column.
The argument colname should be the name of a column.
If the column is not indexed, nothing happens.
If it does not exist, a KeyError
is raised.
This index can be created again by calling
the createIndex()
(see description)
method of the appropriate
Column
object.
Add remaining rows in buffers to non-dirty indexes. This can be useful when you have chosen non-automatic indexing for the table (see Section 4.17.2) and want to update the indexes on it.
Recompute all the existing indexes in table. This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.
Recompute the existing indexes in table, but
only if they are dirty. This can be useful when
you have set the reindex
parameter to 0 in
IndexProps
constructor (see description) for the table and
want to update the indexes after a invalidating index
operation (Table.removeRows
, for example).
Iterate over values fulfilling a condition
.
This method returns an iterator yielding
Row
(see 4.6.4)
instances built from rows in the table that satisfy the
given condition
over a column. If that
column is indexed, its index will be used in order to
accelerate the search. Else, the in-kernel
iterator (with has still better performance than
standard Python selections) will be chosen
instead. Please, check the Section 5.2 for more information
about the performance of the different searching modes.
Moreover, if a range is supplied (i.e. some of the
start
, stop
or
step
parameters are passed), only the rows
in that range and fulfilling the
condition
are returned. The meaning of the
start
, stop
and
step
parameters is the same as in the
range()
Python function, except that
negative values of step
are not
allowed. Moreover, if only start
is
specified, then stop
will be set to
start+1
.
You can mix this method with standard Python selections in order to have complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.
Example of use:
passvalues=[] for row in table.where(0 < table.cols.col1 < 0.3, step=5): if row['col2'] <= 20: passvalues.append(row['col3']) print "Values that pass the cuts:", passvalues
Note that, from PyTables
1.1 on, you can
nest several iterators over the same table. For example:
for p in rout.where(rout.cols.pressure < 16): for q in rout.where(rout.cols.pressure < 9): for n in rout.where(rout.cols.energy < 10): print "pressure, energy:", p['pressure'],n['energy']
In this example, iterators returned by
where()
have been used,
but you may as well use any of the other reading iterators that the
Table
object offers. Look at
examples/nested-iter.py
for the full code.
Append rows fulfilling the condition to the dstTable table.
dstTable must be capable of taking the rows
resulting from the query, i.e. it must have columns with
the expected names and compatible types. The meaning of
the other arguments is the same as in the
where()
method (see description).
The number of rows appended to dstTable is returned as a result.
Get the row coordinates that fulfill the condition parameter. This method will take advantage of an indexed column to speed-up the search.
flavor is the desired type of the returned
list. It can take the "numarray"
,
"numpy"
, "numeric"
or
"python"
values. The default is returning
an object of the same flavor than
self.flavor
.
Following are described the methods that automatically
trigger actions when a Table
instance is
accessed in a special way (e.g.,
table["var2"]
will be equivalent to a call to
table.__getitem__("var2")
).
It returns the same iterator than
Table.iterrows(0,0,1)
. However, this does not
accept parameters.
Example of use:
result = [ row['var2'] for row in table if row['var1'] <= 20 ]
Which is equivalent to:
result = [ row['var2'] for row in table.iterrows() if row['var1'] <= 20 ]
Note: This iterator can be nested (see example in description).
Get a row or a range of rows from the table.
If the key
argument is an integer, the
corresponding table row is returned as a
numarray.records.Record
or as a tables.nestedrecords.NestedRecord
object, whichever is more appropriate.
If key
is a slice, the range of rows
determined by it is returned as a
numarray.records.RecArray
or as a tables.nestedrecords.NestedRecArray
object, whichever is more appropriate.
Using a string as key
to get a column is
supported but deprecated. Please use the
col()
(see description)
method.
Example of use:
record = table[4] recarray = table[4:1000:2]
Those statements are equivalent to:
record = table.read(start=4)[0] recarray = table.read(start=4, stop=1000, step=2)
Here you can see how indexing and slicing can be used
as shorthands for the read()
(see description) method.
It takes different actions depending on the
type of the key
parameter:
Integer
The corresponding table row is set to
value. value must be a
List
or
Tuple
capable of being
converted to the table field format.
Slice
The row slice
determined by key is set to
value. value must be a
NestedRecArray
object or a
RecArray
object or a list of rows capable
of being converted to the table field format.
Example of use:
# Modify just one existing row table[2] = [456,'db2',1.2] # Modify two existing rows rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]], formats="i4,a3,f8") table[1:3:2] = rows
Which is equivalent to:
table.modifyRows(start=2, rows=[456,'db2',1.2]) rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]], formats="i4,a3,f8") table.modifyRows(start=1, step=2, rows=rows)
This class is used to fetch and set values on the table fields. It works very much like a dictionary, where the keys are the field names of the associated table and the values are the values of those fields in a specific row.
This object turns out to actually be an extension type, so you won't be able to access its documentation interactively. However, you will be able to access some of its internal attributes through the use of Python properties. In addition, there are some important methods that are useful for adding and modifying values in tables.
Property that returns the current row number in the table. It is useful to know which row is being dealt with in the middle of a loop or iterator.
Once you have filled the proper fields for the current row, calling this method actually append these new data to the disk (actually data are written to the output buffer).
Example of use:
row = table.row for i in xrange(nrows): row['col1'] = i-1 row['col2'] = 'a' row['col3'] = -1.0 row.append() table.flush()
Please, note that, after the loop in which
Row.append()
has been
called, it is always convenient to make a call to
Table.flush()
in order to
avoid losing the last rows that can be in internal buffers.
This allows you to modify values of your tables when you are in
the middle of table iterators, like
Table.iterrows()
(see
description) or
Table.where()
(see description). Once you
have filled the proper fields for the current row, calling this method
actually commits these data to the disk (actually data are written to
the output buffer).
Example of use:
for row in table.iterrows(step=10): row['col1'] = row.nrow row['col2'] = 'b' row['col3'] = 0.0 row.update()
which modifies every tenth row in table. Or:
for row in table.where(table.cols.col1 > 3): row['col1'] = row.nrow row['col2'] = 'b' row['col3'] = 0.0 row.update()
which just updates the rows with values in first column bigger than 3.
This class is used as an accessor to the table
columns following the natural name convention, so that you
can access the different columns because there exists one
attribute with the name of the columns for each associated
column, which can be a Column
instance
(non-nested column) or another Cols
instance
(nested column).
Columns under a Cols
accessor can be accessed
as attributes of it. For instance, if
table.cols
is a Cols
instance with
a column named col1
under it, the later can be
accessed as table.cols.col1
. If
col1
is nested and contains a col2
column, this can be accessed as
table.cols.col1.col2
and so on and so forth.
A list of the names of the columns (or nested columns)
hanging directly from this
Cols
instance. The order of
the names matches the order of their respective columns in the
containing table.
A list of the complete pathnames of the columns hanging
directly from this Cols
instance. If the table does not contain nested columns, this is exactly
the same as _v_colnames
attribute.
The parent
Table
instance.
The associated Description (see Section 4.9) instance.
Return a handler to the colname column. If
colname is a nested column, a Cols
instance is returned. If colname is a
non-nested column a Column
object is
returned instead.
Get a row or a range of rows from the
Cols
accessor.
If the key
argument is an integer, the
corresponding Cols
row is returned as a
numarray.records.Record
or as a tables.nestedrecords.NestedRecord
object, whichever is more appropriate.
If key
is a slice, the range of rows
determined by it is returned as a
numarray.records.RecArray
or as a tables.nestedrecords.NestedRecArray
object, whichever is more appropriate.
Using a string as key
to get a column is
supported but deprecated. Please use the
col()
(see description)
method.
Example of use:
record = table.cols[4] # equivalent to table[4] recarray = table.cols.Info[4:1000:2]
Those statements are equivalent to:
nrecord = table.read(start=4)[0] nrecarray = table.read(start=4, stop=1000, step=2).field('Info')
Here you can see how a mix of natural naming, indexing
and slicing can be used as shorthands for the
read()
(see description)
method.
Set a row or a range of rows to the Cols
accessor.
If the key
argument is an integer, the
corresponding Cols
row is set to the
value
object. If key
is a
slice, the range of rows determined by it is set to the
value
object.
Example of use:
table.cols[4] = record table.cols.Info[4:1000:2] = recarray
Those statements are equivalent to:
table.modifyRows(4, rows=record) table.modifyColumn(4, 1000, 2, colname='Info', column=recarray)
Here you can see how a mix of natural naming, indexing
and slicing can be used as shorthands for the
modifyRows()
and
modifyColumn()
(see description and description) methods.
The instances of the Description
class provide
a description of the structure of a table.
An instance of this class is automatically bound to
Table
(see 4.6)
objects when they are created. It provides a browseable
representation of the structure of the table, made of
non-nested (Col
—see 4.16.2) and nested
(Description
) columns. It also contains
information that will allow you to build
NestedRecArray
(see Appendix B) objects
suited for the different columns in a table (be they nested
or not).
Column descriptions (see Col
class in 4.16.2) under a description can be
accessed as attributes of it. For instance, if
table.description
is a Description
instance with a column named col1
under it, the
later can be accessed as
table.description.col1
. If col1
is nested and contains a col2
column, this can
be accessed as table.description.col1.col2
.
The name of this description instance. If description is
the root of the nested type (or the description of a flat table), its
name will be the empty string
(''
).
A list of the names of the columns hanging directly from this description instance. The order of the names matches the order of their respective columns in the containing description.
A list of the pathnames of the columns hanging directly
from this description. If the table does not contain nested columns,
this is exactly the same as
_v_names
attribute.
A nested list of the names of all the columns hanging
directly from this description instance. You can use this for the
names
argument of
NestedRecArray
factory
functions.
A nested list of the numarray string formats (and
shapes) of all the columns hanging directly from this description
instance. You can use this for the
formats
argument of
NestedRecArray
factory
functions.
A nested list of pairs of
(name, format)
tuples for
all the columns under this table or nested column. You can use this for
the descr
argument of
NestedRecArray
factory
functions.
A dictionary mapping the names of non-nested columns hanging directly from this description instance to their respective numarray types.
A dictionary mapping the names of non-nested columns hanging directly from this description instance to their respective string types.
A dictionary mapping the names of non-nested columns hanging directly from this description instance to their respective shapes.
A dictionary mapping the names of non-nested columns hanging directly from this description instance to their respective default values. Please, note that all the default values are kept internally as numarray objects.
A dictionary mapping the names of the columns hanging
directly from this description instance to their respective descriptions
(Col
—see
4.16.2— or
Description
—see
4.8
— instances).
A dictionary mapping the names of non-nested columns hanging directly from this description instance to their respective item size (in bytes).
The level of the description in the nested datatype.
Each instance of this class is associated with one column of every table. These instances are mainly used to fetch and set actual data from the table columns, but there are a few other associated methods to deal with indexes.
The parent Table
instance.
The name of the associated column.
The complete pathname of the associated column. This is
mainly useful in nested columns; for non-nested ones this value is the
same a name
.
The data type of the column.
The shape of the column.
The associated
Index
object
(see 4.17.3) to this
column (None
if it does not
exist).
Whether the index is dirty or not (property).
Create an Index
(see 4.17.3) object for this
column.
Recompute the index associated with this column. This can be useful when you suspect that, for any reason, the index information is no longer valid and want to rebuild it.
Recompute the existing index only if it is dirty. This
can be useful when you have set the reindex
parameter to 0 in IndexProps
constructor
(see description) for the
table and want to update the column's index after a
invalidating index operation
(Table.removeRows
, for example).
Delete the associated column's index. After doing that,
you will loose the indexation information on
disk. However, you can always re-create it using the
createIndex()
method (see description).
Returns a column element or slice. It takes different actions depending on the type of the key parameter:
Integer
The corresponding
element in the column is returned as a scalar object
or as a numarray
object, depending on
its shape.
Slice
The row range
determined by this slice is returned as a
numarray
object.
Example of use:
print "Column handlers:" for name in table.colnames: print table.cols[name] print print "Some selections:" print "Select table.cols.name[1]-->", table.cols.name[1] print "Select table.cols.name[1:2]-->", table.cols.name[1:2] print "Select table.cols.lati[1:3]-->", table.cols.lati[1:3] print "Select table.cols.pressure[:]-->", table.cols.pressure[:] print "Select table.cols['temperature'][:]-->", table.cols['temperature'][:]
and the output of this for a certain arbitrary table is:
Column handlers: /table.cols.name (Column(1,), CharType) /table.cols.lati (Column(2,), Int32) /table.cols.longi (Column(1,), Int32) /table.cols.pressure (Column(1,), Float32) /table.cols.temperature (Column(1,), Float64) Some selections: Select table.cols.name[1]--> Particle: 11 Select table.cols.name[1:2]--> ['Particle: 11'] Select table.cols.lati[1:3]--> [[11 12] [12 13]] Select table.cols.pressure[:]--> [ 90. 110. 132.] Select table.cols['temperature'][:]--> [ 100. 121. 144.]
See the examples/table2.py
for a more
complete example.
It takes different actions depending on the
type of the key
parameter:
Integer
The corresponding
element in the column is set to
value. value must be a scalar or
numarray
/NumPy
object,
depending on column's shape.
Slice
The row slice
determined by key is set to
value. value must be a list of
elements or a
numarray
/NumPy
.
Example of use:
# Modify row 1 table.cols.col1[1] = -1 # Modify rows 1 and 3 table.cols.col1[1::2] = [2,3]
Which is equivalent to:
# Modify row 1 table.modifyColumns(start=1, columns=[[-1]], names=["col1"]) # Modify rows 1 and 3 columns = numarray.records.fromarrays([[2,3]], formats="i4") table.modifyColumns(start=1, step=2, columns=columns, names=["col1"])
Represents an array on file. It provides methods to
write/read data to/from array objects in the file. This
class does not allow you to enlarge the datasets on disk;
see the EArray
descendant in Section 4.12 if you want
enlargeable dataset support and/or compression features.
See also CArray
in Section 4.11
The array data types supported are the same as the set
provided by the numarray
package. For
details of these data types see Appendix A, or the
numarray
reference manual ([12]).
An interesting property of the Array
class
is that it remembers the flavor of the object
that has been saved so that if you saved, for example, a
List
, you will get a List
during
readings afterwards, or if you saved a NumPy
array, you will get a NumPy
object.
Note that this object inherits all the public attributes
and methods that Leaf
already provides.
The object representation for this array. It can be any of "numarray", "numpy", "numeric" or "python" values.
The length of the first dimension of the array.
On iterators, this is the index of the current row.
The type class of the represented array.
The string type of the represented array.
The size of the base items. Specially useful for
CharType
objects.
Note that, as this object has no internal I/O buffers, it
is not necessary to use the flush() method inherited from
Leaf
in order to save its internal state to
disk. When a writing method call returns, all the data is
already on disk.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the
corresponding Enum
instance (see 4.17.4) is returned. If it is
not of an enumerated type, a TypeError
is
raised.
Returns an iterator yielding numarray
instances built from rows in array. The return rows are
taken from the first dimension in case of an
Array
and CArray
instance and
the enlargeable dimension in case of an
EArray
instance. If a range is supplied
(i.e. some of the start, stop or
step parameters are passed), only the
appropriate rows are returned. Else, all the rows are
returned. See also the and __iter__()
special methods in Section 4.10.3 for a shorter
way to call this iterator.
The meaning of the start, stop and
step parameters is the same as in the
range()
python function, except that
negative values of step
are not
allowed. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
Example of use:
result = [ row for row in arrayInstance.iterrows(step=4) ]
Read the array from disk and return it as a
numarray
(default) object, or an object
with the same original flavor that it was
saved. It accepts start, stop and
step parameters to select rows (the first
dimension in the case of an Array
and
CArray
instance and the
enlargeable dimension in case of an
EArray
) for reading.
The meaning of the start, stop and
step parameters is the same as in the
range()
python function, except that
negative values of step
are not
allowed. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
Following are described the methods that automatically
trigger actions when an Array
instance is
accessed in a special way (e.g.,
array[2:3,...,::2]
will be equivalent to a
call to array.__getitem__(slice(2,3, None),
Ellipsis, slice(None, None, 2))
).
It returns the same iterator than
Array.iterrows(0,0,1)
. However, this does not
accept parameters.
Example of use:
result = [ row[2] for row in array ]
Which is equivalent to:
result = [ row[2] for row in array.iterrows(0, 0, 1) ]
It returns a numarray
(default) object (or
an object with the same original flavor that it
was saved) containing the slice of rows stated in the
key
parameter. The set of allowed tokens in
key
is the same as extended slicing in
python (the Ellipsis
token included).
Example of use:
array1 = array[4] # array1.shape == array.shape[1:] array2 = array[4:1000:2] # len(array2.shape) == len(array.shape) array3 = array[::2, 1:4, :] array4 = array[1, ..., ::2, 1:4, 4:] # General slice selection
Sets an Array element, row or extended slice. It takes
different actions depending on the type of the
key
parameter:
key
is an integer:The
corresponding row is assigned to value. If needed,
this value
is broadcasted to fit the
specified row.
key
is a slice:The row
slice determined by it is assigned to
value
. If needed, this value
is broadcasted to fit in the desired range. If the
slice to be updated exceeds the actual shape of the
array, only the values in the existing range are
updated, i.e. the index error will be silently
ignored. If value
is a multidimensional
object, then its shape must be compatible with the
slice specified in key
, otherwise, a
ValueError
will be issued.
Example of use:
a1[0] = 333 # Assign an integer to a Integer Array row a2[0] = "b" # Assign a string to a string Array row a3[1:4] = 5 # Broadcast 5 to slice 1:4 a4[1:4:2] = "xXx" # Broadcast "xXx" to slice 1:4:2 # General slice update (a5.shape = (4,3,2,8,5,10) a5[1, ..., ::2, 1:4, 4:] = arange(1728, shape=(4,3,2,4,3,6))
This is a child of the Array
class (see 4.10) and as such,
CArray
represents an array on the file. The
difference is that CArray
has a chunked layout
and, as a consequence, it also supports compression. You
can use this class to easily save or load array (or array
slices) objects to or from disk, with compression support
included.
In addition to the attributes that CArray
inherits from Array
, it supports some more
that provide information about the filters used.
An Atom
(see
4.16.3) instance
representing the shape, type and flavor of the atomic objects to be saved.
See below a small example of CArray
class. The code is available in
examples/carray1.py
.
import numarray import tables fileName = 'carray1.h5' shape = (200,300) atom = tables.UInt8Atom(shape = (128,128)) filters = tables.Filters(complevel=5, complib='zlib') h5f = tables.openFile(fileName,'w') ca = h5f.createCArray(h5f.root, 'carray', shape, atom, filters=filters) # Fill a hyperslab in ca. The array will be converted to UInt8 elements ca[10:60,20:70] = numarray.ones((50,50)) h5f.close() # Re-open a read another hyperslab h5f = tables.openFile(fileName) print h5f print h5f.root.carray[8:12, 18:22] h5f.close()
The output for the previous script is something like:
carray1.h5 (File) '' Last modif.: 'Thu Jun 16 10:47:18 2005' Object Tree: / (RootGroup) '' /carray (CArray(200L, 300L)) '' [[0 0 0 0] [0 0 0 0] [0 0 1 1] [0 0 1 1]]
This is a child of the Array
class (see 4.10) and as such,
EArray
represents an array on the file. The
difference is that EArray
allows to enlarge
datasets along any single dimension[13] you select. Another important difference is
that it also supports compression.
So, in addition to the attributes and methods that
EArray
inherits from Array
, it
supports a few more that provide a way to enlarge the
arrays on disk. Following are described the new variables
and methods as well as some that already exist in
Array
but that differ somewhat on the meaning
and/or functionality in the EArray
context.
An Atom
(see 4.16.3)
instance representing the shape, type and flavor of the atomic objects
to be saved. One of the dimensions of the shape is 0, meaning that the
array can be extended along it.
The enlargeable dimension, i.e. the dimension this array can be extended along.
The length of the enlargeable dimension of the array.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the
corresponding Enum
instance (see 4.17.4) is returned. If it is
not of an enumerated type, a TypeError
is
raised.
Appends a sequence
to the underlying
dataset. Obviously, this sequence must have the same
type as the EArray
instance; otherwise a
TypeError
is issued. In the same way, the
dimensions of the sequence
have to conform to
those of EArray
, that is, all the
dimensions have to be the same except, of course, that of
the enlargeable dimension which can be of any length
(even 0!).
Example of use (code available in
examples/earray1.py
):
import tables from numarray import strings fileh = tables.openFile("earray1.h5", mode = "w") a = tables.StringAtom(shape=(0,), length=8) # Use 'a' as the object type for the enlargeable array array_c = fileh.createEArray(fileh.root, 'array_c', a, "Chars") array_c.append(strings.array(['a'*2, 'b'*4], itemsize=8)) array_c.append(strings.array(['a'*6, 'b'*8, 'c'*10], itemsize=8)) # Read the string EArray we have created on disk for s in array_c: print "array_c[%s] => '%s'" % (array_c.nrow, s) # Close the file fileh.close()
and the output is:
array_c[0] => 'aa' array_c[1] => 'bbbb' array_c[2] => 'aaaaaa' array_c[3] => 'bbbbbbbb' array_c[4] => 'cccccccc'
Instances of this class represents array objects in the
object tree with the property that their rows can have a
variable number of
(homogeneous) elements (called atomic objects, or
just atoms). Variable length arrays (or
VLA's for short), similarly to Table
instances, can have only one dimension, and likewise
Table
, the compound elements (the
atoms) of the rows of VLArrays
can be
fully multidimensional objects.
VLArray
provides methods to read/write data
from/to variable length array objects residents on disk.
Also, note that this object inherits all the public
attributes and methods that Leaf
already has.
An Atom
(see 4.16.3)
instance representing the shape, type and flavor of the atomic objects
to be saved.
On iterators, this is the index of the current row.
The total number of rows.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the
corresponding Enum
instance (see 4.17.4) is returned. If it is
not of an enumerated type, a TypeError
is
raised.
Append objects in the sequence
to the array.
This method appends the objects in the sequence
to a single row in this array.
The type of individual objects must be compliant with
the type of atoms in the array.
In the case of variable length strings, the very string to append
is the sequence
.
Example of use (code available in
examples/vlarray1.py
):
import tables from numpy import * # or, from numarray import * # Create a VLArray: fileh = tables.openFile("vlarray1.h5", mode = "w") vlarray = fileh.createVLArray(fileh.root, 'vlarray1', tables.Int32Atom(flavor="numpy"), "ragged array of ints", Filters(complevel=1)) # Append some (variable length) rows: vlarray.append(array([5, 6])) vlarray.append(array([5, 6, 7])) vlarray.append([5, 6, 9, 8]) # Now, read it through an iterator: for x in vlarray: print vlarray.name+"["+str(vlarray.nrow)+"]-->", x # Close the file fileh.close()
The output of the previous program looks like this:
vlarray1[0]--> [5 6] vlarray1[1]--> [5 6 7] vlarray1[2]--> [5 6 9 8]
The objects
argument is only retained
for backwards compatibility; please do not
use it.
Returns an iterator yielding one row per iteration. If
a range is supplied (i.e. some of the start,
stop or step parameters are passed),
only the appropriate rows are returned. Else, all the
rows are returned. See also the __iter__()
special methods in Section 4.13.3 for a
shorter way to call this iterator.
The meaning of the start, stop and
step parameters is the same as in the
range()
python function, except that
negative values of step
are not
allowed. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
Example of use:
for row in vlarray.iterrows(step=4): print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
Returns the actual data in VLArray
. As the
lengths of the different rows are variable, the returned
value is a python list, with as many entries as
specified rows in the range parameters.
The meaning of the start, stop and
step parameters is the same as in the
range()
python function, except that
negative values of step
are not
allowed. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
Following are described the methods that automatically
trigger actions when a VLArray
instance is
accessed in a special way (e.g., vlarray[2:5]
will be equivalent to a call to
vlarray.__getitem__(slice(2,5,None)
).
It returns the same iterator than
VLArray.iterrows(0,0,1)
. However, this does
not accept parameters.
Example of use:
result = [ row for row in vlarray ]
Which is equivalent to:
result = [ row for row in vlarray.iterrows() ]
It returns the slice of rows determined by
key
, which can be an integer index or an
extended slice. The returned value is a list of objects
of type array.atom.type
.
Example of use:
list1 = vlarray[4] list2 = vlarray[4:1000:2]
Updates a vlarray row described by keys
by
setting it to value
. Depending on the value
of keys
, the action taken is different:
keys
is an integer:It
refers to the number of row to be modified. The
value
object must be type and shape
compatible with the object that exists in the vlarray
row.
keys
is a tuple:The
first element refers to the row to be modified, and
the second element to the range (so, it can be an
integer or an slice) of the row that will be
updated. As above, the value
object must
be type and shape compatible with the object specified
in the vlarray row and range.
Note: When updating
VLStrings
(codification UTF-8) or
Objects
atoms, there is a problem: one can
only update values with exactly the same bytes
than in the original row. With UTF-8 encoding this is
problematic because, for instance, 'c
'
takes 1 byte, but '
' takes two. The same
applies when using Objects
atoms, because
when cPickle applies to a class instance (for example),
it does not guarantee to return the same number of bytes
than over other instance, even of the same class than
the former. These facts effectively limit the number of
objects than can be updated in VLArray
s.
Example of use:
vlarray[0] = vlarray[0]*2+3 vlarray[99,3:] = arange(96)*2+3 # Negative values for start and stop (but not step) are supported vlarray[99,-99:-89:2] = vlarray[5]*2+3
Instances of this class represents an unimplemented dataset
in a generic HDF5 file. When reading such a file (i.e. one
that has not been created with PyTables
, but
with some other HDF5 library based tool), chances are that
the specific combination of datatypes and/or
dataspaces in some dataset might not be supported
by PyTables
yet. In such a case, this dataset
will be mapped into the UnImplemented
class and
hence, the user will still be able to build the complete
object tree of this generic HDF5 file, as well as enabling
the access (both read and write) of the attributes
of this dataset and some metadata. Of course, the user won't
be able to read the actual data on it.
This is an elegant way to allow users to work with generic
HDF5 files despite the fact that some of its datasets would
not be supported by PyTables
. However, if you
are really interested in having access to an unimplemented
dataset, please, get in contact with the developer team.
This class does not have any public instance variables,
except those inherited from the Leaf
class
(see 4.5).
Represents the set of attributes of a node (Leaf or Group). It provides methods to create new attributes, open, rename or delete existing ones.
Like in Group
instances,
AttributeSet
instances make use of the
natural naming convention, i.e. you can access the
attributes on disk like if they were normal
AttributeSet
attributes. This offers the user
a very convenient way to access (but also to set and
delete) node attributes by simply specifying them like a
normal attribute class.
Caveat emptor: All Python
data types are supported. In particular, multidimensional
numarray
objects are saved natively as
multidimensional objects in the HDF5 file. Python strings
are also saved natively as HDF5 strings, and loaded back
as Python strings. However, the rest of the data types
including the Python scalar ones (i.e. Int, Long and
Float) and more general objects (like NumPy
or Numeric
) are serialized using
cPickle
, so you will be able to correctly
retrieve them only from a Python-aware HDF5 library. So,
if you want to save Python scalar values and be able to
read them with generic HDF5 tools, you should make use of
scalar numarray
objects (for example
numarray.array(1, type=numarray.Int64)
). In
the same way, attributes in HDF5 native files will be
always mapped into numarray
objects. Specifically, a multidimensional attribute will
be mapped into a multidimensional numarray
and an scalar will be mapped into a scalar
numarray
(for example, an attribute of type
H5T_NATIVE_LLONG
will be read and returned as
a numarray.array(X, type=numarray.Int64)
scalar).
One more warning: because of the various potential difficulties
in restoring a Python object stored in an attribute,
you may end up getting a cPickle
string
where a Python object is expected.
If this is the case, you may wish to run cPickle.loads()
on that string to get an idea of where things went wrong,
as shown in this example:
>>> import tables >>> >>> class MyClass(object): ... foo = 'bar' ... >>> # An object of my custom class. ... myObject = MyClass() >>> >>> h5f = tables.openFile('test.h5', 'w') >>> h5f.root._v_attrs.obj = myObject # store the object >>> print h5f.root._v_attrs.obj.foo # retrieve it bar >>> h5f.close() >>> >>> # Delete class of stored object and reopen the file. ... del MyClass, myObject >>> >>> h5f = tables.openFile('test.h5', 'r') >>> print h5f.root._v_attrs.obj.foo Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'str' object has no attribute 'foo' >>> # Let us inspect the object to see what is happening. ... print repr(h5f.root._v_attrs.obj) 'ccopy_reg\n_reconstructor\np1\n(c__main__\nMyClass\np2\nc__builtin__\nobject\np3\nNtRp4\n.' >>> # Maybe unpickling the string will yield more information: ... import cPickle >>> cPickle.loads(h5f.root._v_attrs.obj) Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'MyClass' >>> # So the problem was not in the stored object, ... # but in the *environment* where it was restored. ... h5f.close()
The parent node instance.
List with all attribute names.
List with system attribute names.
List with user attribute names.
Note that this class defines the
__setattr__
,
__getattr__
and
__delattr__
and they work as
normally intended. Any scalar (string, ints or floats) attribute is
supported natively as an attribute. However,
(c)Pickle
is automatically
used so as to serialize other kind of objects (like lists, tuples, dicts,
small NumPy/Numeric/numarray objects, ...) that you might want to save. If
an attribute is set on a target node that already has a large number of
attributes, a
PerformanceWarning
will be
issued.
With these special methods, you can access, assign or delete attributes on disk by just using the next constructs:
leaf.attrs.myattr = "str attr" # Set a string (native support) leaf.attrs.myattr2 = 3 # Set an integer (native support) leaf.attrs.myattr3 = [3,(1,2)] # A generic object (Pickled) attrib = leaf.attrs.myattr # Get the attribute myattr del leaf.attrs.myattr # Delete the attribute myattr
Copy the user attributes (as well as certain
system attributes) to where object.
where has to be a
Group
or
Leaf
instance.
Return a list of attribute names of the parent node.
attrset selects the attribute set to be used. A
user
value returns only the
user attributes and this is the default.
sys
returns only the system
attributes. all
returns
both the system and user attributes.
In this section a series of classes that are meant to
declare datatypes that are required for primary
PyTables
(like Table
or
VLArray
) objects are described.
This class is designed to be used as an easy, yet
meaningful way to describe the properties of
Table
objects through the definition of
derived classes that inherit properties from it.
In order to define such a class, you must declare it as
descendant of IsDescription, with as many
attributes as columns you want in your table. The name of
each attribute will become the name of a column, and its
value will hold a description of it.
Ordinary columns can be described using instances of the
Col
(see Section 4.16.2) class. Nested
columns can be described by using classes derived from
IsDescription
or instances of it. Derived
classes can be declared in place (in which case the column
takes the name of the class) or referenced by name, and
they can have a _v_pos
special attribute
which sets the position of the nested column among its
sibling columns.
Once you have created a description object, you can pass
it to the Table
constructor, where all the
information it contains will be used to define the table
structure. See the Section 3.4 for an example on how
that works.
See below for a complete list of the special attributes
that can be specified to complement the
metadata of an IsDescription
class.
The flavor of the table. It can take "numarray" (default) or "numpy" values. This determines the type of objects returned during input (i.e. read) operations.
An instance of the
IndexProps
class (see
Section 4.17.2). You
can use this to alter the properties of the index creation process for
a table.
Sets the position of a possible nested column description among its sibling columns.
The Col
class is used as a mean to declare
the different properties of a table column. In addition, a
series of descendant classes are offered in order to make
these column descriptions easier to the user. In general,
it is recommended to use these descendant classes, as they
are more meaningful when found in the middle of the code.
The type class of the column.
The string type of the column.
The string type, in
RecArray
format, of the
column.
The shape of the column.
The size of the base items. Specially useful for
StringCol
objects.
Whether this column is meant to be indexed or not.
The position of this column with regard to its column siblings.
The name of this column
The complete pathname of the column. This is mainly
useful in nested columns; for non-nested ones this value is the same a
_v_name
.
A description of the different constructors with their parameters follows:
Declare the properties of a
Table
column.
The data type for the column. All types listed in Appendix A are valid data types for columns. The type description is accepted both in string-type format and as a numarray data type.
An integer or a tuple, that specifies the number of
dtype items for each element (or shape, for
multidimensional elements) of this column. For
CharType
columns, the
last dimension is used as the length of the character strings.
However, for this kind of objects, the use of
StringCol
subclass is
strongly recommended.
The default value for elements of this column. If the user does not supply a value for an element while filling a table, this default value will be written to disk. If the user supplies an scalar value for a multidimensional column, this value is automatically broadcasted to all the elements in the column cell. If dflt is not supplied, an appropriate zero value (or null string) will be chosen by default. Please, note that all the default values are kept internally as numarray objects.
By default, columns are arranged in memory following an alpha-numerical order of the column names. In some situations, however, it is convenient to impose a user defined ordering. pos parameter allows the user to force the desired ordering.
Whether this column should be indexed for better performance in table selections.
Declare a column to be of type
CharType
. The
length parameter sets the length of the strings.
The meaning of the other parameters are like in the
Col
class.
Define a column to be of type
Bool
. The meaning of the
parameters are the same of those in the
Col
class.
Declare a column to be of type
IntXX
, depending on the
value of itemsize parameter, that sets the number
of bytes of the integers in the column. sign
determines whether the integers are signed or not. The meaning of the
other parameters are the same of those in the
Col
class.
This class has several descendants:
Define a column to be of type
FloatXX
, depending on the
value of itemsize
. The
itemsize
parameter sets
the number of bytes of the floats in the column and the default is 8
bytes (double precision). The meaning of the other parameters are the
same as those in the Col
class.
This class has two descendants:
Define a column to be of type
ComplexXX
, depending on
the value of itemsize
.
The itemsize
parameter
sets the number of bytes of the complex types in the column and the
default is 16 bytes (double precision complex). The meaning of the
other parameters are the same as those in the
Col
class.
ComplexCol
columns
and its descendants do not support indexation.
This class has two descendants:
Define a column to be of type Time. Two
kinds of time columns are supported depending on the value of
itemsize
: 4-byte signed
integer and 8-byte double precision floating point columns (the
default ones). The meaning of the other parameters are the same as
those in the Col
class.
Time columns have a special encoding in the HFD5 file. See Appendix A for more information on those types.
This class has two descendants:
Description of a column of an enumerated type.
Instances of this class describe a table column which stores
enumerated values. Those values belong to an enumerated type, defined
by the first argument
(enum
) in the constructor
of EnumCol
, which accepts
the same kinds of arguments as
Enum
(see 4.17.4). The
enumerated type is stored in the
enum
attribute of the
column.
A default value must be specified as the second argument
(dflt
) in the
constructor; it must be the name (a string) of
one of the enumerated values in the enumerated type. Once the column
is created, the corresponding concrete value is stored in its
dflt
attribute. If the
name does not match any value in the enumerated type, a
KeyError
is raised.
A numarray data type might be specified in order to determine
the base type used for storing the values of enumerated values in
memory and disk. The data type must be able to represent each and
every concrete value in the enumeration. If it is not, a
TypeError
is raised. The
default base type is unsigned 32-bit integer, which is sufficient for
most cases.
The stype
attribute
of enumerated columns is always
'Enum'
, while the
type
attribute is the
data type used for storing concrete values.
The shape, position and indexed attributes of the column are treated as with other column description objects (see 4.16.2).
The Atom
class is a descendant of the
Col
class (see 4.16.2) and is meant to declare the
different properties of the base element (also
known as atom) of CArray
,
EArray
and VLArray
objects. The
Atom
instances have the property that their
length is always the same. However, you can grow objects
along the extensible dimension in the case of
EArray
or put a variable number of them on a
VLArray
row. Moreover, the atoms are not
restricted to scalar values, and they can be fully
multidimensional objects.
A series of descendant classes are offered in order to make the use of these element descriptions easier. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.
In addition to the variables that it inherits from the
Col
class, it has the next additional
attributes:
The object representation for this atom. See below on
constructors description for
Atom
class the possible
values it can take.
A description of the different constructors with their parameters follows:
Define properties for the base elements of
CArray
,
EArray
and
VLArray
objects.
The data type for the base element. See the Appendix A for a relation of data types supported. The type description is accepted both in string-type format and as a numarray data type.
In a
EArray
context, it is a
tuple specifying the shape of the object, and
one (and only one) of its dimensions must be 0,
meaning that the EArray
object will be enlarged along this axis. In the case of a
VLArray
, it can be an
integer with a value of 1 (one) or a tuple, that specifies whether
the atom is an scalar (in the case of a 1) or has multiple
dimensions (in the case of a tuple). For
CharType
elements, the
last dimension is used as the length of the character strings.
However, for this kind of objects, the use of
StringAtom
subclass is
strongly recommended.
The object representation for this atom. It can be
any of "numarray", "numpy"
or "python" for the character types and
"numarray", "numpy",
"numeric" or "python" for
the numerical types. If specified, the read atoms will be converted
to that specific flavor. If not specified, the atoms will remain in
their native format (i.e.
numarray
).
Define an atom to be of
CharType
type. The
meaning of the shape parameter is the same as in
the Atom
class.
length sets the length of the strings atoms.
flavor can be whether
"numarray"
,
"numpy"
or
"python"
. Unicode strings
are not supported by this type; see the
VLStringAtom
class if you
want Unicode support (only available for
VLAtom
objects).
Define an atom to be of type
Bool
. The meaning of the
parameters are the same of those in the
Atom
class.
Define an atom to be of type
IntXX
, depending on the
value of itemsize parameter, that sets the number
of bytes of the integers that conform the atom.
sign determines whether the integers are signed
or not. The meaning of the other parameters are the same of those in
the Atom
class.
This class has several descendants:
Define an atom to be of
FloatXX
type, depending
on the value of itemsize
.
The itemsize
parameter
sets the number of bytes of the floats in the atom and the default is
8 bytes (double precision). The meaning of the other parameters are
the same as those in the
Atom
class.
This class has two descendants:
Define an atom to be of
ComplexXX
type, depending
on the value of itemsize
.
The itemsize
parameter
sets the number of bytes of the floats in the atom and the default is
16 bytes (double precision complex). The meaning of the other
parameters are the same as those in the
Atom
class.
This class has two descendants:
Define an atom to be of type
Time. Two kinds of time atoms are supported
depending on the value of
itemsize
: 4-byte signed
integer and 8-byte double precision floating point atoms (the default
ones). The meaning of the other parameters are the same as those in
the Atom
class.
Time atoms have a special encoding in the HFD5 file. See Appendix A for more information on those types.
This class has two descendants:
Description of an atom of an enumerated type.
Instances of this class describe the atom type used by an array to store enumerated values. Those values belong to an enumerated type.
The meaning of the
enum
and
dtype
arguments is the
same as in EnumCol
(see
4.16.2). The
shape
and
flavor
arguments have the
usual meaning of other
Atom
classes (the
flavor
applies to the
representation of concrete read values).
Enumerated atoms also have
stype
and
type
attributes with the
same values as in
EnumCol
.
Now, there come two special classes,
ObjectAtom
and
VLString
, that actually do
not descend from Atom
, but
which goal is so similar that they should be described here. The
difference between them and the
Atom
and descendants
classes is that these special classes does not allow multidimensional
atoms, nor multiple values per row. A flavor can
not be specified neither as it is immutable (see below).
Caveat emptor: You are only allowed to use
these classes to create
VLArray
objects, not
CArray
and
EArray
objects.
This class is meant to fit any kind of
object in a row of an
VLArray
instance by using
cPickle
behind the
scenes. Due to the fact that you can not foresee how long will be the
output of the cPickle
serialization (i.e. the atom already has a
variable length), you can only fit a representant
of it per row. However, you can still pass several parameters to the
VLArray.append()
method
as they will be regarded as a tuple of compound
objects (the parameters), so that we still have only one object to be
saved in a single row. It does not accept parameters and its flavor is
automatically set to
"Object"
, so the reads of
rows always returns an arbitrary python object.
You can regard ObjectAtom
types as an easy way to save an arbitrary number of generic python
objects in a VLArray
object.
This class describes a row of the
VLArray
class, rather
than an atom. It differs from the
StringAtom
class in that
you can only add one instance of it to one specific row, i.e. the
VLArray.append()
method
only accepts one object when the base atom is of this type. Besides,
it supports Unicode strings (contrarily to
StringAtom
) because it
uses the UTF-8 codification (this is why its
atomsize()
method returns
always 1) when serializing to disk. It does not accept any parameter
and because its flavor is automatically set to
"VLString"
, the reads of
rows always returns a python string. See the
Section D.3.5 if you are
curious on how this is implemented at the low-level.
You can regard
VLStringAtom
types as an
easy way to save generic variable length strings.
In this section are listed classes that does not fit in any other section and that mainly serve for ancillary purposes.
This class is meant to serve as a container that keeps
information about the filter properties associated with
the enlargeable leaves, that is Table
,
EArray
and VLArray
as well as
CArray
.
The public variables of Filters
are listed
below:
The compression level (0 means no compression).
The compression filter used (in case of compressed dataset).
Whether the shuffle filter is active or not.
Whether the fletcher32 filter is active or not.
There are no Filters
public methods with the
exception of the constructor itself that is described
next.
The parameters that can be passed to the
Filters
class constructor are:
Specifies a compress level for data. The allowed range is 0-9. A value of 0 disables compression. The default is that compression is disabled, that balances between compression effort and CPU consumption.
Specifies the compression
library to be used. Right now, "zlib"
(default), "lzo"
, "ucl"
and "bzip2"
values are supported. See
Section 5.3 for
some advice on which library is better suited to
your needs.
Whether or not to use the
shuffle filter present in the
HDF5
library. This is normally used to
improve the compression ratio (at the cost of
consuming a little bit more CPU time). A value of 0
disables shuffling and 1 makes it active. The default
value depends on whether compression is enabled or
not; if compression is enabled, shuffling defaults to
be active, else shuffling is disabled.
Whether or not to use the fletcher32 filter in the HDF5 library. This is used to add a checksum on each data chunk. A value of 0 disables the checksum and it is the default.
Of course, you can also create an instance and then assign the ones you want to change. For example:
import numarray as na from tables import * fileh = openFile("test5.h5", mode = "w") atom = Float32Atom(shape=(0,2)) filters = Filters(complevel=1, complib = "lzo") filters.fletcher32 = 1 arr = fileh.createEArray(fileh.root, 'earray', atom, "A growable array", filters = filters) # Append several rows in only one call arr.append(na.array([[1., 2.], [2., 3.], [3., 4.]], type=na.Float32)) # Print information on that enlargeable array print "Result Array:" print repr(arr) fileh.close()
This enforces the use of the LZO
library, a
compression level of 1 and a fletcher32 checksum filter
as well. See the output of this example:
Result Array: /earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array' type = Float32 shape = (3L, 2) itemsize = 4 nrows = 3 extdim = 0 flavor = 'numarray' byteorder = 'little'
You can use this class to set/unset the properties in the
indexing process of a Table
column. To use
it, create an instance, and assign it to the special
attribute _v_indexprops
in a table description class
(see 4.16.1) or dictionary.
The public variables of IndexProps
are listed
below:
Whether an existing index should be updated or not after a table append operation.
Whether the table columns are to be re-indexed after an invalidating index operation.
The filter settings for the different
Table
indexes.
There are no IndexProps
public methods with
the exception of the constructor itself that is described
next.
The parameters that can be passed to the
IndexProps
class constructor are:
Specifies whether an existing index should be updated or not after a table append operation. The default is enable automatic index updates.
Specifies whether the table
columns are to be re-indexed after an invalidating
index operation (like for example, after a
Table.removeRows
call). The default is to
reindex after operations that invalidate indexes.
Sets the filter properties for
Column
indexes. It has to be an instance
of the Filters
(see 4.17.1) class. A
None
value means that the default
settings for the Filters
object are
selected.
This class is used to keep the indexing information for
table columns. It is actually a descendant of the
Group
class, with
some added functionality.
It has no methods intended for programmer's use, but it has some attributes that may be interesting for him.
The column object this index belongs to.
The type class for the index.
The size of the atomic items. Specially useful
for columns of
CharType
type.
The total number of elements in index.
Whether the index is dirty or not.
The
Filters
(see
Section 4.17.1)
instance for this index.
Each instance of this class represents an enumerated type. The values of the type must be declared exhaustively and named with strings, and they might be given explicit concrete values, though this is not compulsory. Once the type is defined, it can not be modified.
There are three ways of defining an enumerated type. Each
one of them corresponds to the type of the only argument in the
constructor of Enum
:
Sequence of names: each enumerated value is named using a string, and its order is determined by its position in the sequence; the concrete value is assigned automatically:
>>> boolEnum = Enum(['True', 'False'])
Mapping of names: each
enumerated value is named by a string and given an explicit
concrete value. All of the concrete values must be different, or
a ValueError
will
be raised.
>>> priority = Enum({'red': 20, 'orange': 10, 'green': 0}) >>> colors = Enum({'red': 1, 'blue': 1}) Traceback (most recent call last): ... ValueError: enumerated values contain duplicate concrete values: 1
Enumerated type: in that case, a copy of the original enumerated type is created. Both enumerated types are considered equal.
>>> prio2 = Enum(priority) >>> priority == prio2 True
Please, note that names starting with
_
are not allowed,
since they are reserved for internal usage:
>>> prio2 = Enum(['_xx']) Traceback (most recent call last): ... ValueError: name of enumerated value can not start with ``_``: '_xx'
The concrete value of an enumerated value is obtained
by getting its name as an attribute of the Enum
instance
(see __getattr__()
) or as an item (see __getitem__()
).
This allows comparisons between enumerated values
and assigning them to ordinary Python variables:
>>> redv = priority.red >>> redv == priority['red'] True >>> redv > priority.green True >>> priority.red == priority.orange False
The name of the enumerated value corresponding to a
concrete value can also be obtained by using the
__call__()
method of the enumerated type.
In this way you get the symbolic name to use it later
with __getitem__()
:
>>> priority(redv) 'red' >>> priority.red == priority[priority(priority.red)] True
(If you ask, the __getitem__()
method is
not used for this purpose to avoid ambiguity in the case
of using strings as concrete values.)
Get the concrete value of the enumerated value with that
name
.
The name
of
the enumerated value must be a string. If there is no value with
that name
in the
enumeration, a
KeyError
is
raised.
Get the concrete value of the enumerated value with that
name
.
The name
of
the enumerated value must be a string. If there is no value with
that name
in the
enumeration, an
AttributeError
is
raised.
Is there an enumerated value with that
name
in the
type?
If the enumerated type has an enumerated
value with that
name
,
True
is returned.
Otherwise, False
is
returned. The name
must be a string.
This method does
not check for concrete values matching a
value in an enumerated type. For that, please use the
__call__()
method.
Get the name of the enumerated value with that concrete
value
.
If there is no value with that concrete value in the
enumeration and a second argument is given as a
default
, this is
returned. Else, a
ValueError
is
raised.
This method can be used for checking that a concrete value belongs to the set of concrete values in an enumerated type.
Iterate over the enumerated values.
Enumerated values are returned as
(name, value)
pairs
in no particular order.
Is the other
enumerated type equivalent to this one?
Two enumerated types are equivalent if they have exactly the same enumerated values (i.e. with the same names and concrete values).