Atex Categorization
There are two fundamental parts to the Atex Categorization system: taxonomies, and content categorization. Taxonomies are created using content, and accessed using the Taxonomy Service. Categorizing content is done by adding an aceCategorization aspect based on one or more taxonomies to the content.
Entities, Dimensions and Taxonomies
In the Atex Categorization System, categorization happens in Dimensions, which are hierarchies of Entities. An Entity is something like a person, a location or a subject - something you might want to categorise content with. Entities may have child Entities, to provide more detail (e.g. "San Francisco" could be a child of "California" which would be a child of "USA"). Entities are organised as a tree, i.e. any given entity cannot be a child of more than one entity or dimension.
Dimensions may be enumerable, which means that all possible Entities in the dimension are defined somewhere (normally by content). For dimensions that are not enumerable, it is assumed that there are so many possible entities that trying to exhaustively list them would be futile. Instead entities exist when something has been categorized with the entity.
Dimensions can be organized in Taxonomies. The same Dimension can be a member of several different taxonomies. Multiple taxonomies can be useful in cases when multiple organizations share the same Atex system, so that common dimensions can be shared but organization-specific dimension can be used to fulfill the different needs of different organizations.
Building taxonomies
Taxonomies, dimensions and entities are all defined by content, which
must have a representation in the aceTaxonomy variant.
Taxonomies, dimensions and entities are defined in very similar ways. There are two core requirements on content defining one of these: that it have an alias in the _taxonomy namespace, and that it is available in the aceTaxonomy variant as an aceTaxonomy, aceDimension or aceEntity. The way to do this is to have an aspect that is one of aceTaxonomyDefinition, aceDimensionDefinition or aceEntityDefinition, which are automatically transformed into the appropriate type by a system composer. There is a corresponding content type for each of these: aceTaxonomyDefinitionContent, aceDimensionDefinitionContent and aceEntityDefinitionContent. The content types are optional but provide a default composer mapping using aceTaxonomy.default to create the representations - this composer can be used for custom types, too, as long as they use the definition aspects.
The reason there are multiple types is that the definition types deal with a single object, while the tree representation types include the children of the object as well, and the content type is so we have something to attach the standard composer to.
To describe relations between entities and dimensions, children reference their parents. This ensures that an entity can have only one parent, and keeps the size of dimension and entity definitions reasonable even where there are thousands of children. Taxonomies, on the other hand, explicitly list their dimensions because dimensions are allowed to be in more than one taxonomy and dimensions are not expected to be large in number.
aceTaxonomyDefinition
A taxonomy definition contains a name, an ID, and a list of dimensions. The list of dimensions is represented as a list of aliases without namespace (always in the _taxonomy namespace).
| Attribute | Description |
|---|---|
| id | Should match the content's alias in _taxonomy |
| name | Default human readable name for the taxonomy |
| dimensions | A list of strings that are un-namespaced aliases for dimensions |
Currently taxonomy names cannot be localized.
aceDimensionDefinition
A dimension definition contains a name, an ID, a flag specifying whether the dimension is enumerable or not, and localizations. The definition does not include a list of the dimension's child entities, instead the entities reference their parents.
| Attribute | Description |
|---|---|
| id | Should match the content's alias in _taxonomy |
| name | Default human readable name for the dimension |
| enumerable | Defines whether the dimension is considered enumerable or not |
| localizations | An object mapping language codes to the dimension's name in that language |
aceEntityDefinition
An entity definition contains a name, an ID, a parent, and localizations. The definition does not include a list of the entity's children, instead entities reference their parents.
| Attribute | Description |
|---|---|
| id | Should match the content's alias in _taxonomy |
| name | Default human readable name for the entity |
| localizations | An object mapping language codes to the dimension's name in that language |
| parent | The entity's parent, either another entity or a dimension |
Taxonomy Service
The Taxonomy Service is a REST web service for accessing Atex Metadata
taxonomies. It contains two endpoints: structure and
complete. structure is used to fetch the taxonomy structure
definitions stored in the system, while complete is intended to
support autocompletion of entities. The Taxonomy Service uses JSON to
represent results.
There are currently two possible values for the {functional-domain} part of the url, namely: structure, and complete. The structure domain concerns examining and updating the underlying taxonomy structure.
Structure
The structure endpoint lets you get a taxonomy, dimension or
entity. It is located under /structure in the the Taxonomy Service,
where ID is a taxonomy, dimension or entity ID. It only supports GET
requests.
All operations for the structure endpoint follow the same URL format:
/structure/{id}
where ID is the ID of the taxonomy, dimension or entity you want to get. The ID can also be left out, in which case you will get all taxonomies.
Depth
When fetching a structure, you can request a certain maximum depth, to limit the size of the response. By default the maximum depth is 100, which should be enough for almost all taxonomies.
To indicate if there are items that would be included in the response but are omitted because of the maximum depth, a special field called childrenOmitted is used. This field is just information about the response from the service (so that e.g. a widget can know whether the item should be expandable without fetching the whole structure) and should not be included when using the response for categorization.
Error handling
If you request something that isn't a known item, the service will respond with a 404 Not Found. The entity in the response is undefined.
Taxonomy format
When you get a taxonomy, the response is an object containing the taxonomy's name, ID, and the dimensions it contains. For details about dimensions, see below.
| Attribute | Description |
|---|---|
| id | The taxonomy's ID |
| name | The human readable name |
| dimensions | The dimensions in this taxonomy |
{
"id" : "custom.taxonomy",
"name" : "Custom Taxonomy",
"dimensions" : [
{
"name" : "Subject",
"id" : "dimension.subject",
...
},
{
"name" : "Person",
"id" : "dimension.person",
...
}
]
}
Dimension format
When you get a dimension, the response is an object with information about the dimension at the top level, and a list of its' entities. For more details about the entities, see the section about entities.
| Attribute | Description |
|---|---|
| id | The attribute's ID |
| name | The human readable name |
| localizations | A map mapping language codes to the dimension's name in those languages |
| enumerable | True if only predefined entities are allowed, false otherwise |
| entities | The entities in the dimension |
Abbreviated example
{
"entities" : [
{
"id" : "entity.subject.education",
"name" : "education",
"entities" : [],
...
} ],
"name" : "Subject",
"childrenOmitted" : false,
"localizations" : {
"sv" : "Ämne",
"_type" : "MAP::string"
},
"id" : "dimension.subject",
"enumerable" : true
}
Describe entity: GET /structure/{entity id}
Returns the entity and its children.
| Attribute | Description |
|---|---|
| id | The entity's ID |
| name | The human readable name |
| localizations | A map mapping language codes to the entity's name in those languages |
| entities | The children of this entity |
| attributes | An object with custom attributes, the values of which are strings |
Abbreviated example
{
"id" : "entity.subject.sport",
"name" : "sport",
"localizations" : {
"_type" : "MAP::string"
},
"entities" : [
{
"name" : "bowling",
"attributes" : [],
"localizations" : {
"_type" : "MAP::string"
},
"entities" : [],
"id" : "entity.subject.sport.bowling",
"childrenOmitted" : false
},
{
"entities" : [
{
"id" : "entity.subject.sport.cricket.ashes",
"childrenOmitted" : false,
"attributes" : [],
"localizations" : {
"_type" : "MAP::string"
},
"entities" : [],
"name" : "ashes"
}
],
"localizations" : {
"_type" : "MAP::string"
},
"attributes" : [],
"id" : "entity.subject.sport.cricket",
"childrenOmitted" : false,
"name" : "cricket"
}
],
"childrenOmitted" : false,
"attributes" : []
}
Completion
The auto completion service is located under /complete. It takes two path arguments: a dimension ID, followed by the user input to complete.
/complete/{dimension id}/{user input}
This operation will return all entities in the dimension that match {user input}. {user input} is a string where each whitespace-separated token is matched against whitespace-separated tokens in indexed entities, or a /-separated series of such strings each matching one level in the entity hierarchy. Keep in mind that the user input part needs to be URL encoded.
Example of completion
El Pr would match Elvis Presley, but not President Elvis. Pr/Ge Wa would match Presidents/George Washington but not President George Washington. Pr would match President George Washington and Presidents (i.e. the parent of Presidents/George Washington), but not Presidents/George Washington.
What can be autocompleted
Auto completion is based on the Solr fields
tag_autocomplete_{dimension id}_lc and tag_{dimension
id}. This is automatically populated from the aceCategorization aspect
on content that is indexed, but custom entities must have an index
mapping that indexes into these fields.
Response format
The response uses a variation of the dimension format used in the structure web service, with each match included as an entity in the entities list.
{
"id": "department.categorydimension.tag.Person",
"name": "Person",
"enumerable": false,
"entities": [
{
"id": "President George Washington",
"name": "President George Washington",
"entities": [],
"attributes": [],
"childrenOmitted": false
},
{
"id": "Presidents",
"name": "Presidents",
"entities": [],
"attributes": [],
"childrenOmitted": false
}
]
}
Content Hierarchy
A special taxonomy called contentHierarchy is used to model
hierarchies among contents.