This chapter describes a production installation, for a developer installation see Quickstart.
Scaling and resilience
Most of the components that ACE contains and depends on are horizontally scalable. For resilience we recommend running at least three nodes of services, so maintenance can be done on one without losing redundancy. Where there are additional scaling considerations for a service, it is mentioned in the section for that service.
For ease of scaling, we recommend using Docker Swarm for the ACE services.
Overview of installation
This is an overview of how to install ACE, see below for details on each component.
- Setup all of the prerequisites.
- Create a docker compose file or similar for ACE.
- Configure the ACE services.
- Start the ACE services.
- Import content.
Prerequisites
ACE depends on a number of third party services to run:
- Docker >= 17.12.0-ce
- docker-compose >= 1.18
- Couchbase 5.0.1-ce
- SolrCloud 7.1.0
- Kafka 0.11.0.0
- Zookeeper 3.4.6
- LDAP server
Docker and docker-compose need no special configuration, they must simply be installed where ACE is to be run. For the rest of the services, either the service or ACE must be configured in certain ways which are described below.
Installation of the third party software is not described here, but we provide links to their own documentation.
Optional dependencies
- HTTP cache
While it is not necessary for ACE to function, using an HTTP cache between ACE's web services and end users is strongly recommended, especially the image service which does a lot of work for each request. How to configure the cache is not documented in detail here as there are many options, but ACE provides reasonable cache headers by default, so there should be no need for complicated configuration just to get it to work correctly.
Docker and docker-compose
Link: Docker
ACE has no special requirements on Docker configuration.
Couchbase
Link: Couchbase.
By default ACE will initially connect to Couchbase on a host named ace-couch, see below for how to connect to all your nodes. We recommend that the bucket be set up with two replicas.
ACE stores data in a couchbase bucket named cmbucket by default, which must also have the following N1QL indexes defined:
alias_index
CREATE INDEX `alias_index` ON `cmbucket`(`contentId`) WHERE ((meta().`id`) like "a:%")
alias_meta_index
CREATE INDEX `alias_meta_index` ON `cmbucket`((meta().`id`)) WHERE ((((meta().`id`) like "a:\\_role/%") or ((meta().`id`) like "a:\\_type/%")) or ((meta().`id`) like "a:\\_workspace/%"))
content_index
CREATE INDEX `content_index` ON `cmbucket`((meta().`id`),`mainAlias`,`revision`) WHERE ((meta().`id`) like "c:%")
version_index
CREATE INDEX `version_index` ON `cmbucket`((meta().`id`),(`system`.`id`),(`system`.`contentType`)) WHERE ((meta().`id`) like "v:%")
You can create these indexes using any tool of your choice (Couchbase Admin UI, cbq shell application, etc). If you are using a bucket other than the default cmbucket, remember to replace the name of the bucket in the above definitions with the name of your own bucket.
ACE needs full access to the bucket.
SolrCloud
Link: SolrCloud
We recommend that each collection should have at least 3 shards and a replication factor of 2.
A standard installation of ACE uses two collections, internal and public. For requirements on the schema see Required fields. In addition the indexer requires an update request processor chain.
...
<updateRequestProcessorChain name="newerRevision" default="false">
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.DocBasedVersionConstraintsProcessorFactory">
<str name="versionField">aceRevision_l</str>
<bool name="ignoreOldUpdates">true</bool>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
...
This is very important when running multiple indexers.
By default ACE will connect to SolrCloud on a host named ace-solr. Because SolrCloud supports load balancing, there is no need to tell ACE about every server as long as there is a load balancer (such as the one built in to Docker Swarm).
Example config
Kafka
Link: Kafka
By default ACE will initially connect to Kafka on a host named ace-kafka, see below for how to connect to all your nodes.
ACE uses two topics: aceEvents and aceContentEvents. These topics should only have one partition, but be replicated:
bin/kafka-topics.sh --create --zookeeper <zookeper-host>:2181 --replication-factor 3 --partitions 1 --topic aceEvents
bin/kafka-topics.sh --create --zookeeper <zookeper-host>:2181 --replication-factor 3 --partitions 1 --topic aceContentEvents
Zookeeper
Link: Zookeeper
Zookeeper is not used directly by ACE, it is used by SolrCloud and Kafka. We recommend running at least 5 Zookeeper nodes, so that a two-node failure can be handled without downtime. Unlike most other components, Zookeeper performance does not improve with more nodes.
Make sure that Kafka and SolrCloud can connect to the Zookeeper nodes.
LDAP
We assume there is already an LDAP server you wish to authenticate against, so we don't link to the documentation here.
By default ACE will connect to LDAP on a host named ace-ldap. The
content service (which is where the authentication service resides)
must be configured with some details of your LDAP server and schema:
the base DN to search for users in, and the URL to the LDAP server. By
default, users are assumed to have the object class inetOrgPerson
with the login name in the uid attribute, but this is configurable.
# content-service.yml
login:
# Configures the connection to the LDAP server.
ldap:
# userObjectClass is the name of the LDAP object class that represents
# users.
userObjectClass: inetOrgPerson
# loginNameAttribute is the LDAP attribute where a user's login name is
# stored.
loginNameAttribute: uid
# userSearchBaseDN is used as a base for all user searches
userSearchBaseDN: <baseDN>
# Provider URL for the LDAP server
providerUrl: ldaps://<host>:636
# DN for readaccess to LDAP
# Location of the file containing the LDAP credentials. Don't change
# this setting, use Docker secrets instead.
credentialsFile: file:/opt/ace/ldap-credentials.json
ldap-credentials.json:
{
"securityPrincipal": "cn=read-user,dc=polopoly,dc=ninja",
"securityCredentials": "123456"
}
ACE Services
ACE is distributed as Docker images available on Docker Hub. By default the services expect to be able to find each other using the service names in this table.
| Image | Service Name | ACE Component |
|---|---|---|
| atexproducts/ace.kafka-connect | ace-kafka-connect | Kafka Connect |
| atexproducts/ace.file-service | ace-file-service | File Service |
| atexproducts/ace.content-service | ace-content-service | Content Service |
| atexproducts/ace.indexer | ace-indexer | Solr Indexer |
| atexproducts/ace.search-service | ace-search-service | Search Service |
| atexproducts/ace.image-service | ace-image-service | Image Service |
| atexproducts/ace.taxonomy-service | ace-taxonomy-service | Taxonomy Service |
| atexproducts/ace.file-delivery-service | ace-file-delivery-service | File Delivery Service |
All of the ACE services are implemented in java, and it is possible to pass
JVM arguments using the environment variable ACE_JAVA_OPTS.
By default, the services are configured in such a way that the JVM
memory is constrained by the memory allocated to that service. For even
stricter control on the memory usage, consider providing relevant
arguments, e.g., Xmn, Xms, Xmx, etc., to the JVM. More details on service
specific memory allocation is given below.
Configure ACE
Network configuration
The services will attempt to find their dependencies by resolving known hostnames. (Inside the same Docker network, containers may find each other by their service name.)
Table of default ports used by the components
These are the ports used by the services. Use Docker port mapping with Docker Compose or similar to expose or map them on the Docker host.
| Component | ports, (encrypted ports) | Used by | More... |
|---|---|---|---|
| ZooKeeper | 2181 | Kafka, SolrCloud, Indexer | Nodes in a ZooKeeper Ensamble use the ports 2888 and 3888 to communicate to each other |
| Couchbase | 8091-8094, 11210 | Content Service, Kafka Connect | See the Couchbase documentation for an overview of ports used by Couchbase |
| Kafka | 9092 | Content Service, Kafka Connect, Indexer | |
| Kafka Connect | 8083 | The Kafka Connect REST interface. Used locally to determine if component has started. | |
| LDAP | 389, (636) | Content Service | |
| SolrCloud | 8983 | Indexer, Search Service | |
| S3 | depends | File Service | The network configuration depends on the type of S3 that is used |
| File Service | 8080 | Content Service, File Delivery Service, Image Service | |
| API:s | |||
| Search Service | 8080 | Content Service, Taxonomy Service, Ace Clients | |
| Content Service | 8080 | Indexer, Search Service, Taxonomy Service, Image Service, File Delivery Service, Ace Clients | |
| Image Service | 8080 | Ace Clients | |
| File Delivery Service | 8080 | Ace Clients | |
| Taxonomy Service | 8080 | Ace Clients |
Content Service
Docker Hub: Content Service
The content service requires access to Couchbase, Kafka, the file service and the search service.
Memory consumption for the content service is dominated by the in-memory content caches, especially the content version cache. With the default cache size settings, 4GB of memory for the JVM is appropriate, however the exact amount of memory used by each content version is hard to predict, so the memory setting usually needs to be adjusted according to the project. With increased cache sizes, the JVM memory should also be increased proportionally.
# docker-compose.yml
...
ace-content-service:
...
deploy:
resources:
limits:
memory: 4G
...
...
The content service needs the authentication secret key.
When scaling the content service, it will often be better for performance to have a few instances with a lot of memory to use for cache, than many instances with little memory. In most installations, this will be one of the most performance intensive parts of the system, so it may need quite a few instances.
Couchbase
couchbase:
# Location of the file containing the Couchbase credentials. Don't change
# this setting, use Docker secrets instead.
credentialsFile: file:/opt/ace/couch-credentials.json
# connectionString lists the Couchbase nodes to connect to on startup. This
# doesn't need to be exhaustive because Couchbase tells us which nodes are
# in the cluster, but at least one of these nodes must be available on
# startup.
connectionString: couchbase://ace-couch,couchbase://ace-couch-2
- The
connectionStringis the list of couchbase servers to connect to. - Couchbase credentials are configured using the credentials file mapped using
credentialsFile. Mount your custom credentials file into the configured location.
Example couch-credentials.json file:
{
"bucket": "<couchbase bucket>",
"username": "<couchbase username>",
"password": "<couchbase password>"
}
Since the credentials file contains Couchbase credentials, we recommend using Docker secrets to manage this file.
Kafka
The Kafka brokers to connect to need to be listed in the configuration file.
kafka:
# brokers contains a list of the Kafka brokers to connect to.
brokers:
- ace-kafka:9092
- ace-kafka-2:9092
File Service
Docker Hub: File Service
The file service uses Amazon S3 or a compatible service. Access to such a service needs to be configured in the following way.
s3:
# accessKey is the AWS access key.
accessKey: ${ACCESS_KEY}
# secretKey is the AWS secret key.
secretKey: ${SECRET_KEY}
# bucket to store files in.
bucket: ${BUCKET}
# region to store files in.
region: ${REGION}
# serviceEndpoint lets you use a non-AWS S3 implementation.
serviceEndpoint: ${S3_ENDPOINT}
Since this file contains credentials we recommend using Docker secrets to manage the file.
The file service also needs the authentication secret key.
The file service does not require a lot of memory to operate; 256 MB should be enough.
# docker-compose.yml
...
ace-file-service:
...
deploy:
resources:
limits:
memory: 256M
...
...
File Delivery Service
Docker Hub: File Delivery Service
File Delivery Service Configuration
The file delivery service needs access to the content service and the file service.
The file delivery service does not require a lot of memory to operate; 256 MB should be enough.
# docker-compose.yml
...
ace-file-delivery-service:
...
deploy:
resources:
limits:
memory: 256M
...
...
Image Service
Docker Hub: Image Service
The image service needs access to the content service and the file service.
Since this file contains credentials we recommend using Docker secrets to manage the file.
The image service stores images in memory when rescaling them, so it may require approximately 2x(size of largest image)x(number of requests) memory. A good starting point is 2GB.
# docker-compose.yml
...
ace-image-service:
...
deploy:
resources:
limits:
memory: 2GB
...
...
The image service uses a thread pool to do the actual scaling. By default it is configured to use up to 12 threads, but this should be set to approximately 1.25-1.5x the number of CPU cores that can be dedicated to the image service. The service also uses a queue for requests waiting for a thread to execute on, which needs to be sized appropriately for the expected load.
exec:
# queueSize is the maximum length of the image scaling request queue. If
# more requests are queued we reject the requests with errors.
queueSize: 12
# coreSize is the number of threads to run scaling on before queueing up
# requests.
coreSize : 12
# maxSize is the maximum number of threads to run scaling on, when the queue
# is full.
maxSize : 12
Indexer
Docker Hub: Indexer
The indexer needs the authentication secret key.
The indexer needs access to Kafka and a content service.
The indexer does not require a lot of memory to operate; 256 MB should be enough.
Each indexer can handle multiple collections (such as the standard setup with one internal and one public collection), each with a variant and a list of views from which it will index versions:
# indexer.yml
# collectionName is the Solr collection the indexer will write to.
collections:
- collectionName: internal
views: [aceLatest]
variant: aceIndexing
While you can run multiple instances of each indexer for resilience, doing so will come at a slight cost in performance (there is absolutely no performance gain whatsoever), so it may be better to rely on a watchdog starting up a new instance if the indexer stops working - the indexer keeps almost no local state so it starts very quickly. Running more than two instances of an indexer is not recommended.
Run as re-indexer
The indexer can also be set up to act as a re-indexer, using one instance for each search collection (just like how the regular indexing is setup).
The re-indexer needs the authentication secret key.
An indexer container working as a re-indexer should be configured slightly differently compared to a normal indexer:
- It should listen to Kafka topic
aceReindexEvents - It should have its batch size decreased
- It should allow indexing of old revisions
Example:
# indexer.yml
kafka:
# Name of re-indexer topic.
topic: aceReindexEvents
# batchSize controls how many messages are processed in a batch.
batchSize: 32
# Indicates whether old revisions should be ignored or still indexed
onlyIndexNewerRevisions: false
For more information on re-indexing of search collections, see the re-indexing documentation.
Kafka Connect
Docker Hub: Kafka Connect
Kafka Connect is not wholly an ACE component – it is a standard Kafka Connect with an ACE connector based on kafka-connect-couchbase. Because of this, configuration does not work like it does in all the other ACE services.
Kafka Connect needs to be able to access Couchbase, Kafka and Zookeeper.
The default configuration is located in /opt/couch-kafka.json:
{
"name": "couch-kafka-connector-aceEvents",
"config": {
"connector.class": "com.couchbase.connect.kafka.CouchbaseSourceConnector",
"tasks.max": "1",
"connection.cluster_address": "ace-couch",
"topic.name": "aceEvents",
"connection.bucket": "cmbucket",
"connection.username": "cmuser",
"connection.password": "cmpasswd",
"connection.timeout.ms": "2000",
"use_snapshots": false,
"dcp.message.converter.class": "com.atex.ace.indexing.connector.converter.CouchbaseEventConverter",
"event.filter.class": "com.atex.ace.indexing.connector.filter.AceKafkaFilter"
}
}
There are two ways to configure Kafka Connect: by mounting the configuration file as well as using the built-in configuration in combination with environment variables.
Mounting the configuration file
By mounting the entire /opt/couch-kafka.json file into the Kafka Connect container, you can take over the configuration entirely. In that case, use the default configuration (shown above) and change relevant details.
The properties bucket, username and password should be changed to reflect the bucket and
credentials used in Couchbase.
Since this file contains credentials we recommend using Docker secrets to manage the file.
Overriding specific default configuration details
An alternative to mounting the entire /opt/couch-kafka.json configuration file is to use the default configuration in combination with overrides from environment variables. If you do so, you must provide all four variables at once. The following fields from the configuration is overridable:
connection.cluster_address- read from the environment variableCOUCHBASE_HOSTNAMEif presentconnection.bucket- read from the environment variableCOUCHBASE_BUCKETif presentconnection.username- read from the environment variableCOUCHBASE_USERNAMEif presentconnection.password- read from the environment variableCOUCHBASE_PASSWORDif present
Other configuration
To configure how to connect to Kafka and Zookeeper, two environment variables are used.
Both of these are using a comma-separated connection string format like host:port,host2:port2:
services:
ace-kafka-connect:
...
environment:
ZOOKEEPER="ace-zookeeper:2181,ace-zookeeper-2:2181,ace-zookeeper-3:2181"
ACE_COUCH="ace-couch:8091,ace-couch-2:8091,ace-couch-3:8091"
General configuration is read by Kafka Connect via file /opt/ace-kafka-connect.properties. Below is the default developer configuration used in ACE:
group.id=ace-kafka-connect
offset.storage.topic=connect-offsets
config.storage.topic=connect-configs
status.storage.topic=connect-status
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.flush.interval.ms=10000
offset.storage.file.filename=/tmp/koffsets
config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
bootstrap.servers=ace-kafka:9092
NOTE: To achieve fault tolerance and durability in a production installation, we recommend having the
bootstrap.servers property specify at least two of the Kafka instances:
...
bootstrap.servers=ace-kafka:9092,ace-kafka-2:9092,ace-kafka-3:9092
...
NOTE: To achieve fault tolerance and durability in a production installation, we recommend using at least three instances of the Kafka Connect container. Simply deploy three or more docker containers using the same configuration (except for different port being exposed). Also set the following three properties to at least factor three:
...
config.storage.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
...
The Kafka Connect service should be OK with 256 MB memory, but if you experience memory or performance issues, feel free to increase this.
Kafka Connect reference documentation
Search Service
Docker Hub: Search Service
The search service needs access to SolrCloud and the content service. The search service needs credentials to read config, but the user it uses does not need any permissions beyond being able to log in.
The search service needs the authentication secret key.
The search service does not require a lot of memory to operate; 256 MB should be enough.
Taxonomy Service
Docker Hub: Taxonomy Service
Taxonomy Service Configuration
The taxonomy service needs access to the search service and the content service.
The taxonomy service needs the authentication secret key.
The taxonomy service does not require a lot of memory to operate; 256 MB should be enough.
Running the ACE services
The ACE services can be started in any order, but the prerequisite services (Couchbase, Kafka, Zookeeper, SolrCloud) should be up and running.
To start ACE in docker swarm using the example docker compose file simply run:
docker stack deploy --compose-file docker-compose.yml ace
Importing content
The recommended way to deal with importing the system and project
content is to build a content.jar containing both, using the
ace-maven-plugin mvn ace:package. To do this have the system content
as a maven dependency for the project content artifact.
Importing a jar file
To import content you must first
authenticate. Then simply post the
jar file to http://ace-content-service:8081/content/import with the
authentication token in the X-Auth-Token.
curl --data-binary @./content/target/content-0.9.2-SNAPSHOT-content.jar -XPOST -H "Content-Type: application/java-archive" -H "X-Auth-Token: $TOKEN" http://localhost:8081/content/import\?unsafeUpdate=true
Post installation
When an initial ACE installation has been completed, you should remember to change the passwords of any users remaining in the system password file (see authentication).