English

Queries and Indexes

Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file named index.yaml. The development web server automatically adds suggestions to this file as it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application.

The index-based query mechanism supports most common kinds of queries, but it does not support some queries you may be used to from other database technologies. Restrictions on queries, and their explanations, are described below.

Introducing Queries

A query retrieves entities from the datastore that meet a set of conditions. The query specifies an entity kind, zero or more conditions based on entity property values (sometimes called "filters"), and zero or more sort order descriptions. When the query is executed, it fetches all entities of the given kind that meet all of the given conditions, sorted in the order described.

A query can also return just the keys of the result entities instead of the entities themselves.

The datastore Python API provides two interfaces for preparing and executing queries: the Query interface, which uses methods to prepare the query, and the GqlQuery interface, which uses a SQL-like query language called GQL to prepare the query from a query string. These interfaces are described in more detail in Creating, Getting and Deleting Data: Getting Entities Using a Query and the corresponding reference pages.

class Person(db.Model):
  first_name = db.StringProperty()
  last_name = db.StringProperty()
  city = db.StringProperty()
  birth_year = db.IntegerProperty()
  height = db.IntegerProperty()

# The Query interface prepares a query using instance methods.
q = Person.all()
q.filter("last_name =", "Smith")
q.filter("height <", 72)
q.order("-height")

# The GqlQuery interface prepares a query using a GQL query string.
q = db.GqlQuery("SELECT * FROM Person " + 
                "WHERE last_name = :1 AND height < :2 " +
                "ORDER BY height DESC",
                "Smith", 72)

# The query is not executed until results are accessed.
results = q.fetch(5)
for p in results:
  print "%s %s, %d inches tall" % (p.first_name, p.last_name, p.height)

A filter includes a property name, a comparison operator, and a value. An entity passes the filter if it has a property of the given name and its value compares to the given value as described by the operator. The entity is a result for the query if it passes all of its filters.

The filter operator can be any of the following:

  • < less than
  • <= less than or equal to
  • = equal to
  • > greater than
  • >= greater than or equal to
  • != not equal to
  • IN equal to any of the values in the provided list

The != operator actually performs 2 queries: one where all other filters are the same and the not-equal filter is replaced with a less-than filter, and one where the not-equal filter is replaces with a greater-than filter. The results are merged, in order. As described below in the discussion of inequality filters, a query can only have one not-equal filter, and such a query cannot have other inequality filters.

The IN operator also performs multiple queries, one for each item in the provided list value where all other filters are the same and the IN filter is replaces with an equal-to filter. The results are merged, in the order of the items in the list. If a query has more than IN filter, the query is performed as multiple queries, one for each combination of values in the IN filters.

A single query containing != and IN operators is limited to 30 sub-queries.

Introducing Indexes

The App Engine datastore maintains an index for every query an application intends to make. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.

An application has an index for each combination of kind, filter property and operator, and sort order used in a query. Consider the example query, stated in GQL:

SELECT * FROM Person WHERE last_name = "Smith"
                       AND height < 72
                  ORDER BY height DESC

The index for this query is a table of keys for entities of the kind Person, with columns for the values of the height and last_name properties. The index is sorted by height in descending order.

Two queries of the same form but with different filter values use the same index. For example, the following query uses the same index as the query above:

SELECT * FROM Person WHERE last_name = "Jones"
                       AND height < 63
                     ORDER BY height DESC

The datastore executes a query using the following steps:

  1. The datastore identifies the index that corresponds with the query's kind, filter properties, filter operators, and sort orders.
  2. The datastore starts scanning the index at the first entity that meets all of the filter conditions using the query's filter values.
  3. The datastore continues to scan the index, returning each entity, until it finds the next entity that does not meet the filter conditions, until it reaches the end of the index, or until it has collected the maximum number of results requested by the query.

An index table contains columns for every property used in a filter or sort order. The rows are sorted by the following aspects, in order:

  • ancestors
  • property values used in equality filters
  • property values used in inequality filters
  • property values used in sort orders

Note: For the purposes of indexes, IN filters are handled like = filters, and != filters are handled like the other inequality filters.

This puts all results for every possible query that uses this index in consecutive rows in the table.

Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:

db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2", "abc", u"abc" + u"\ufffd")

This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.

This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries you may be used to from other database technologies.

Entities Without a Filtered Property Are Never Returned by a Query

An index only contains entities that have every property referred to by the index. If an entity does not have a property referred to by an index, the entity will not appear in the index, and will never be a result for the query that uses the index.

Note that the App Engine datastore makes a distinction between an entity that does not possess a property and an entity that possesses the property with a null value (None). If you want every entity of a kind to be a potential result for a query, you can use a data model that assigns a default value (such as None) to properties used by query filters.

Properties that Aren't Indexed

Property values that aren't indexed are not findable by queries. This includes properties that are marked as not indexed, as well as properties with values of the long text value type (Text) or the long binary value type (Blob).

A query with a filter or sort order on a property will never match an entity whose value for the property is a Text or Blob, or which was written with that property marked as not indexed. Properties with such values behave as if the property is not set with regard to query filters and sort orders.

Property Values of Mixed Types are Ordered By Type

When two entities have properties of the same name but of different value types, an index of the property sorts the entities first by value type, then by an order appropriate to the type. For example, if two entities each have a property named "age," one with an integer value and one with a string value, the entity with the integer value will always appear before the entity with the string value when sorted by the "Age" property, regardless of the values themselves.

This is especially worth noting in the case of integers and floating point numbers, which are treated as separate types by the datastore. A property with the integer value 38 is sorted before a property with the floating point value 37.5, because all integers are sorted before floats.

Defining Indexes With Configuration

App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named index.yaml. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in index.yaml), the query will fail.

App Engine provides automatic indexes for the following forms of queries:

  • queries using only equality and ancestor filters
  • queries using only inequality filters (which can only be of a single property)
  • queries with only one sort order on a property, either ascending or descending

Other forms of queries require their indexes to be specified in index.yaml, including:

  • queries with multiple sort orders
  • queries with a sort order on keys in descending order
  • queries with one or more inequality filters on a property and one or more equality filters over other properties
  • queries with inequality filters and ancestor filters

The development web server makes managing index configuration easy: Instead of failing to execute a query that does not have an index and requires it, the development web server can generate configuration for an index that would allow the query to succeed. If your local testing of your application calls every possible query the application will make (every combination of kind, ancestor, filter and sort order), the generated entries will represent a complete set of indexes. If your testing might not exercise every possible query form, you can review and adjust the index configuration before uploading the application.

App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named index.yaml. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in index.yaml), the query will fail.

index.yaml describes each index table, including the kind, the properties needed for the query filters and sort orders, and whether or not the query uses an ancestor clause (either Query.ancestor() or a GQL ANCESTOR IS clause). The properties are listed in the order they are to be sorted: properties used in equality or IN filters first, followed by the property used in inequality filters, then the query results sort orders and their directions.

Consider once again the following example query:

SELECT * FROM Person WHERE last_name = "Smith"
                       AND height < 72
                     ORDER BY height DESC

If the application executed only this query (and possibly other queries similar to this one but with different values for "Smith" and 72), the index.yaml file would look like this:

indexes:
- kind: Person
  properties:
  - name: last_name
  - name: height
    direction: desc

When an entity is created or updated, every appropriate index is updated as well. The number of indexes that apply to an entity affects the time it takes to create or update the entity.

For more information on the syntax of index.yaml, see Configuring Indexes.

Queries on Keys

Entity keys can be the subject of a query filter or sort order, using the special name __key__ in place of the property name. The datastore considers the complete key value for such queries, including the entity's parent path, the kind, and the app-assigned key name string or system-assigned numeric ID.

A query can return entity keys instead of full entities. You can trigger this behavior by passing keys_only=True to Query, or by using SELECT __key__ in GQL.

Note: Queries that return keys are faster and cost less CPU than queries that return entities, since the keys themselves are already in the index, so the query doesn't need to fetch the actual entities. If you only need the keys from your query results — for example, if you're just going to delete the results — consider using a keys only query.

Because an entity key is unique across all entities in the system, __key__ queries make it easy to retrieve entities of a given kind in batches, such as for a batch dump of the contents of the datastore. Unlike offset, this works efficiently for any number of entities. For example:

class MainHandler(webapp.RequestHandler):
  def get(self):
    query = Entity.gql('ORDER BY __key__')

    # Use a query parameter to keep track of the last key of the last
    # batch, to know where to start the next batch.
    last_key_str = self.request.get('last')
    if last_key_str:
      last_key = db.Key(last_key_str)
      query = Entity.gql('WHERE __key__ > :1 ORDER BY __key__', last_key)

    # For batches of 20, fetch 21, then use result #20 as the "last"
    # if there is a 21st.
    entities = query.fetch(21)
    new_last_key_str = None
    if len(entities) == 21:
      new_last_key_str = str(entities[19].key())

    # Return the data and new_last_key_str.  Client would use
    # http://...?last=new_last_key_str to fetch the next batch.
    # ...

Keys are ordered first by parent path, then by kind, then by key name or ID. Kinds and key names are strings and are ordered by byte value. IDs are integers and are ordered numerically. If entities of the same parent and kind use a mix of key name strings and numeric IDs, entities with numeric IDs are considered to be less than entities with key name strings. Elements of the parent path are compared similarly: by kind (string), then by key name (string) or ID (number).

Queries involving keys use indexes just like queries involving properties, with one minor difference: Unlike with a property, a query with an equality filter on the key that also has additional filters must use a custom index defined in the app's index configuration file. As with all queries, the development web server creates appropriate configuration entries in this file when a query that needs a custom index is tested.

Restrictions on Queries

The nature of the index query mechanism imposes a few restrictions on what a query can do.

Filtering Or Sorting On a Property Requires That the Property Exists

A query filter condition or sort order for a property also implies a condition that the entity have a value for the property.

A datastore entity is not required to have a value for a property that other entities of the same kind have. A filter on a property can only match an entity with a value for the property. Entities without a value for a property used in a filter or sort order are omitted from the index built for the query.

No Filter That Matches Entities That Do Not Have a Property

It is not possible to perform a query for entities that are missing a given property. One alternative is to create a fixed (modeled) property with a default value of None, then create a filter for entities with None as the property value.

Inequality Filters Are Allowed On One Property Only

A query may only use inequality filters (<, <=, >=, >, !=) on one property across all of its filters.

For example, this query is allowed:

SELECT * FROM Person WHERE birth_year >= :min
                       AND birth_year <= :max

However, this query is not allowed, because it uses inequality filters on two different properties in the same query:

SELECT * FROM Person WHERE birth_year >= :min_year
                       AND height >= :min_height     # ERROR

Filters can combine equal (=) comparisons for different properties in the same query, including queries with one or more inequality conditions on a property. This is allowed:

SELECT * FROM Person WHERE last_name = :last_name
                       AND city = :city
                       AND birth_year >= :min_year

The query mechanism relies on all results for a query to be adjacent to one another in the index table, to avoid having to scan the entire table for results. A single index table cannot represent multiple inequality filters on multiple properties while maintaining that all results are consecutive in the table.

Properties In Inequality Filters Must Be Sorted Before Other Sort Orders

If a query has both a filter with an inequality comparison and one or more sort orders, the query must include a sort order for the property used in the inequality, and the sort order must appear before sort orders on other properties.

This query is not valid, because it uses an inequality filter and does not order by the filtered property:

SELECT * FROM Person WHERE birth_year >= :min_year
                     ORDER BY last_name              # ERROR

Similarly, this query is not valid because it does not order by the filtered property before ordering by other properties:

SELECT * FROM Person WHERE birth_year >= :min_year
                     ORDER BY last_name, birth_year  # ERROR

This query is valid:

SELECT * FROM Person WHERE birth_year >= :min_year
                     ORDER BY birth_year, last_name

To get all results that match an inequality filter, a query scans the index table for the first matching row, then returns all consecutive results until it finds a row that doesn't match. For the consecutive rows to represent the complete result set, the rows must be ordered by the inequality filter before other sort orders.

Sort Orders and Properties With Multiple Values

Due to the way properties with multiple values are indexed, the sort order for these properties is unusual:

  • If the entities are sorted by a multi-valued property in ascending order, the value used for ordering is the smallest value.
  • If the entities are sorted by a multi-valued property in descending order, the value used for ordering is the greatest value.
  • Other values do not affect the sort order, nor does the number of values.
  • In the case of a tie, the key of the entity is used as the tie-breaker.

This sort order has the unusual consequence that [1, 9] comes before [4, 5, 6, 7] in both ascending and descending order.

One important caveat is queries with both an equality filter and a sort order on a multi-valued property. In those queries, the sort order is disregarded. For single-valued properties, this is a simple optimization. Every result would have the same value for the property, so the results do not need to be sorted further.

However, multi-valued properties may have additional values. Since the sort order is disregarded, the query results may be returned in a different order than if the sort order were applied. (Restoring the dropped sort order would be expensive and require extra indices, and this use case is rare, so the query planner leaves it off.)

Big Entities and Exploding Indexes

As described above, every property (that doesn't have a Text or Blob value) of every entity is added to at least one index table, including a simple index provided by default, and any indexes described in the application's index.yaml file that refer to the property. For an entity that has one value for each property, App Engine stores a property value once in its simple index, and once for each time the property is referred to in a custom index. Each of these index entries must be updated every time the value of the property changes, so the more indexes that refer to the property, the more time it will take to update the property.

To prevent the update of an entity from taking too long, the datastore limits the number of index entries that a single entity can have. The limit is large, and most applications will not notice. However, there are some circumstances where you might encounter the limit. For example, an entity with very many single-value properties can exceed the index entry limit.

Properties with multiple values store each value as a separate entry in an index. An entity with a single property with very many values can exceed the index entry limit.

Custom indexes that refer to multiple properties with multiple values can get very large with only a few values. To completely record such properties, the index table must include a row for every permutation of the values of every property for the index.

For example, the following index (described in index.yaml syntax) includes the x and y properties for entities of the kind MyModel:

indexes:
- kind: MyModel
  properties:
  - name: x
  - name: y

The following code creates an entity with 2 values for the property x and 2 values for the property y:

class MyModel(db.Expando):
  pass

e2 = MyModel()
e2.x = ['red', 'blue']
e2.y = [1, 2]
e2.put()

To accurately represent these values, the index must store 12 property values: 2 each for the built-in indexes on x and y, and 2 for each of the 4 permutations of x and y in the custom index. With many values of multi-valued properties, this can mean an index must store very many index entries for a single entity. You could call an index that refers to multiple properties with multiple values an "exploding index," because it can get very large with just a few values.

If a put() would result in a number of index entries that exceeds the limit, the call will fail with an exception. If you create a new index that would contain a number of index entries that exceeds the limit for any entity when built, queries against the index will fail, and the index will appear in the "Error" state in the Admin Console.

To handle "Error" indexes, first remove them from your index.yaml file and run appcfg.py vacuum_indexes. Then, either reformulate the index definition and corresponding queries or remove the entities that are causing the index to "explode." Finally, add the index back to index.yaml and run appcfg.py update_indexes.

You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with multiple sort orders, a mix of equality and inequality filters, and ancestor filters.