Atex Categorization

There are two fundamental parts to the Atex Categorization system: taxonomies, and content categorization. Taxonomies are created using content, and accessed using the Taxonomy Service. Categorizing content is done by adding an aceCategorization aspect based on one or more taxonomies to the content.

Entities, Dimensions and Taxonomies

In the Atex Categorization System, categorization happens in Dimensions, which are hierarchies of Entities. An Entity is something like a person, a location or a subject - something you might want to categorise content with. Entities may have child Entities, to provide more detail (e.g. "San Francisco" could be a child of "California" which would be a child of "USA"). Entities are organised as a tree, i.e. any given entity cannot be a child of more than one entity or dimension.

Dimensions may be enumerable, which means that all possible Entities in the dimension are defined somewhere (normally by content). For dimensions that are not enumerable, it is assumed that there are so many possible entities that trying to exhaustively list them would be futile. Instead entities exist when something has been categorized with the entity.

Dimensions can be organized in Taxonomies. The same Dimension can be a member of several different taxonomies. Multiple taxonomies can be useful in cases when multiple organizations share the same Atex system, so that common dimensions can be shared but organization-specific dimension can be used to fulfill the different needs of different organizations.

Building taxonomies

Taxonomies, dimensions and entities are all defined by content, which must have a representation in the aceTaxonomy variant.

Taxonomies, dimensions and entities are defined in very similar ways. There are two core requirements on content defining one of these: that it have an alias in the _taxonomy namespace, and that it is available in the aceTaxonomy variant as an aceTaxonomy, aceDimension or aceEntity. The way to do this is to have an aspect that is one of aceTaxonomyDefinition, aceDimensionDefinition or aceEntityDefinition, which are automatically transformed into the appropriate type by a system composer. There is a corresponding content type for each of these: aceTaxonomyDefinitionContent, aceDimensionDefinitionContent and aceEntityDefinitionContent. The content types are optional but provide a default composer mapping using aceTaxonomy.default to create the representations - this composer can be used for custom types, too, as long as they use the definition aspects.

The reason there are multiple types is that the definition types deal with a single object, while the tree representation types include the children of the object as well, and the content type is so we have something to attach the standard composer to.

To describe relations between entities and dimensions, children reference their parents. This ensures that an entity can have only one parent, and keeps the size of dimension and entity definitions reasonable even where there are thousands of children. Taxonomies, on the other hand, explicitly list their dimensions because dimensions are allowed to be in more than one taxonomy and dimensions are not expected to be large in number.

aceTaxonomyDefinition

A taxonomy definition contains a name, an ID, and a list of dimensions. The list of dimensions is represented as a list of aliases without namespace (always in the _taxonomy namespace).

Attribute	Description
id	Should match the content's alias in _taxonomy
name	Default human readable name for the taxonomy
dimensions	A list of strings that are un-namespaced aliases for dimensions

Currently taxonomy names cannot be localized.

aceDimensionDefinition

A dimension definition contains a name, an ID, a flag specifying whether the dimension is enumerable or not, and localizations. The definition does not include a list of the dimension's child entities, instead the entities reference their parents.

Attribute	Description
id	Should match the content's alias in _taxonomy
name	Default human readable name for the dimension
enumerable	Defines whether the dimension is considered enumerable or not
localizations	An object mapping language codes to the dimension's name in that language

aceEntityDefinition

An entity definition contains a name, an ID, a parent, and localizations. The definition does not include a list of the entity's children, instead entities reference their parents.

Attribute	Description
id	Should match the content's alias in _taxonomy
name	Default human readable name for the entity
localizations	An object mapping language codes to the dimension's name in that language
parent	The entity's parent, either another entity or a dimension

Taxonomy Service

The Taxonomy Service is a REST web service for accessing Atex Metadata taxonomies. It contains two endpoints: structure and complete. structure is used to fetch the taxonomy structure definitions stored in the system, while complete is intended to support autocompletion of entities. The Taxonomy Service uses JSON to represent results.

There are currently two possible values for the {functional-domain} part of the url, namely: structure, and complete. The structure domain concerns examining and updating the underlying taxonomy structure.

Structure

The structure endpoint lets you get a taxonomy, dimension or entity. It is located under /structure in the the Taxonomy Service, where ID is a taxonomy, dimension or entity ID. It only supports GET requests.

All operations for the structure endpoint follow the same URL format:

/structure/{id}

where ID is the ID of the taxonomy, dimension or entity you want to get. The ID can also be left out, in which case you will get all taxonomies.

Depth

When fetching a structure, you can request a certain maximum depth, to limit the size of the response. By default the maximum depth is 100, which should be enough for almost all taxonomies.

To indicate if there are items that would be included in the response but are omitted because of the maximum depth, a special field called childrenOmitted is used. This field is just information about the response from the service (so that e.g. a widget can know whether the item should be expandable without fetching the whole structure) and should not be included when using the response for categorization.

Error handling

If you request something that isn't a known item, the service will respond with a 404 Not Found. The entity in the response is undefined.

Taxonomy format

When you get a taxonomy, the response is an object containing the taxonomy's name, ID, and the dimensions it contains. For details about dimensions, see below.

Attribute	Description
id	The taxonomy's ID
name	The human readable name
dimensions	The dimensions in this taxonomy

{
  "id" : "custom.taxonomy",
  "name" : "Custom Taxonomy",
  "dimensions" : [
    {
      "name" : "Subject",
      "id" : "dimension.subject",
        ...
    },
    {
      "name" : "Person",
      "id" : "dimension.person",
        ...
    }
  ]
}

Dimension format

When you get a dimension, the response is an object with information about the dimension at the top level, and a list of its' entities. For more details about the entities, see the section about entities.

Attribute	Description
id	The attribute's ID
name	The human readable name
localizations	A map mapping language codes to the dimension's name in those languages
enumerable	True if only predefined entities are allowed, false otherwise
entities	The entities in the dimension

Abbreviated example

{
  "entities" : [
    {
      "id" : "entity.subject.education",
      "name" : "education",
      "entities" : [],
        ...
    } ],
  "name" : "Subject",
  "childrenOmitted" : false,
  "localizations" : {
    "sv" : "Ämne",
    "_type" : "MAP::string"
  },
  "id" : "dimension.subject",
  "enumerable" : true
}

Describe entity: GET /structure/{entity id}

Returns the entity and its children.

Attribute	Description
id	The entity's ID
name	The human readable name
localizations	A map mapping language codes to the entity's name in those languages
entities	The children of this entity
attributes	An object with custom attributes, the values of which are strings

Abbreviated example

{
  "id" : "entity.subject.sport",
  "name" : "sport",
  "localizations" : {
    "_type" : "MAP::string"
  },
  "entities" : [
    {
      "name" : "bowling",
      "attributes" : [],
      "localizations" : {
        "_type" : "MAP::string"
      },
      "entities" : [],
      "id" : "entity.subject.sport.bowling",
      "childrenOmitted" : false
    },
    {
      "entities" : [
        {
          "id" : "entity.subject.sport.cricket.ashes",
          "childrenOmitted" : false,
          "attributes" : [],
          "localizations" : {
            "_type" : "MAP::string"
          },
          "entities" : [],
          "name" : "ashes"
        }
      ],
      "localizations" : {
        "_type" : "MAP::string"
      },
      "attributes" : [],
      "id" : "entity.subject.sport.cricket",
      "childrenOmitted" : false,
      "name" : "cricket"
    }
  ],
  "childrenOmitted" : false,
  "attributes" : []
}

Completion

The auto completion service is located under /complete. It takes two path arguments: a dimension ID, followed by the user input to complete.

/complete/{dimension id}/{user input}

This operation will return all entities in the dimension that match {user input}. {user input} is a string where each whitespace-separated token is matched against whitespace-separated tokens in indexed entities, or a /-separated series of such strings each matching one level in the entity hierarchy. Keep in mind that the user input part needs to be URL encoded.

Example of completion

El Pr would match Elvis Presley, but not President Elvis. Pr/Ge Wa would match Presidents/George Washington but not President George Washington. Pr would match President George Washington and Presidents (i.e. the parent of Presidents/George Washington), but not Presidents/George Washington.

What can be autocompleted

Auto completion is based on the Solr fields tag_autocomplete_{dimension id}_lc and tag_{dimension id}. This is automatically populated from the aceCategorization aspect on content that is indexed, but custom entities must have an index mapping that indexes into these fields.

Response format

The response uses a variation of the dimension format used in the structure web service, with each match included as an entity in the entities list.

{
  "id": "department.categorydimension.tag.Person",
  "name": "Person",
  "enumerable": false,
  "entities": [
    {
      "id": "President George Washington",
      "name": "President George Washington",
      "entities": [],
      "attributes": [],
      "childrenOmitted": false
    },
    {
      "id": "Presidents",
      "name": "Presidents",
      "entities": [],
      "attributes": [],
      "childrenOmitted": false
    }
  ]
}

Content Hierarchy

A special taxonomy called contentHierarchy is used to model hierarchies among contents.