A tutorial of the costs of dealing with data using Vstr
This page is an advanced tutorial looking at the costs of the different
Vstr calls, given different data.
For a simpler starting tutorial see this page.
Moving data between Vstr_base objects
There are five basic methods of moving data from one Vstr to another:
- vstr_add_vstr( ..., VSTR_TYPE_ADD_DEF): This is the generic way
to move data, doing the obvious of copying all nodes. Ie. _BUF in the source becomes copied _BUF data in the dfestination, _NON, _PTR and _REF data makes a
new node with the _NON length, the pointer or the reference.
- vstr_add_vstr( ..., VSTR_TYPE_ADD_BUF_REF): This converts all _BUF
nodes to _REF nodes, in the source. Then works the same way as
VSTR_TYPE_ADD_DEF. This means that you have to use a _REF node and a Vstr_ref
for each _BUF node in the source, the custom allocation policies in the
Vstr_conf help here. However it also means that all data becomes read-only,
and any alterations will require more _BUF nodes to be allocated in the source
or destination.
- vstr_add_vstr( ..., VSTR_TYPE_ADD_BUF_PTR): This is like
VSTR_TYPE_ADD_BUF_REF, except that _BUF nodes in the source stay _BUF nodes
(so alterations are unaffected there) and Vstr_ref nodes are not used. However
it also means that the source should not be altered before the destination has
been deleted, or the data in the destination may be altered or even invalid
memory.
- vstr_add_vstr( ..., VSTR_TYPE_ADD_ALL_BUF): This option converts
all _PTR and _REF nodes into _BUF nodes, by copying the data in them.
- vstr_mov(): This function tried to move the data from the
source to the destionation. This means that if you move entire nodes
of data, and you haven't deleted data from the begining of a _BUF node that is
at the begining of the Vstr (or aren't moving that node) and the node sizes for
_BUF nodes are identical (or if the Vstr knows you don't have any _BUF nodes
inside the source) then this operation will just be a few pointer
assingments to move any amount of data. Assuming that you don't have _BUF nodes
or that both _BUF sizes are identical, this function will try and get as close
as possible to just doing the pointer assingments. However the somewhat common
case of moving a partial amount of data from a single _BUF node requres copying
the data in the source so it is in three seperate _BUF nodes, and then moving
the middle node to the destination. This can often require copying more data
than if the simple vstr_add_vstr( ..., VSTR_TYPE_ADD_DEF)
had been used.
Deleting data from a Vstr_base object
There is only a single call to delete data from a Vstr,
vstr_del(), however it has different
speed characteristics depending on what it needs to delete. Much like
vstr_mov() the worst case is dealing
with data that doesn't go to the end of a _BUF node, with a single exception.
This requires a single memmove() of all the later data in the node to the start
of the deletion in that node. All node types can delete data form the end of
the node by just changing a variable, nodes can be entirely removed by changing
pointers (although removing pointers from the middle of the Vstr will destroy
the iovec cache -- but that shouldn't be a large hit). All nodes, apart from
_BUF, can delete data from the begining of the node ... and there is
an optimisation so that deleting from the begining of _BUF nodes directly at
the begining of the Vstr requires only a variable alteration (this is a common
operation for IO, add at the end and delete from the begining).
Deleteing from the middle of a non _BUF node is a somewhat faster way of
doing a delete to the end of the node and then an _ADD. Tht means that this
requires an allocation, which is why
vstr_del()
returns a SUCCESS or FAILURE. This also applies to _BUF nodes when the
VSTR_CNTL_CONF_GET_FLAG_DEL_SPLIT
attribute is set in the configuration.
James Antill
Last modified: Mon Jan 26 05:09:10 EST 2004