Ard Schrijvers

Jul 19, 2013

A Bird's Eye Hippo CMS Architectural View Part 1: 10,000 foot view

I will write a series of blogs the coming months, starting from a 10,000 foot bird's eye view, that should help you understand Hippo's architectural stack. I will zoom into different parts of the stack and get to more and more detailed parts. Targeted audiences are sysadmins (infra), architects and developers.

Update: See A Bird's Eye Hippo CMS Architectural View Part 2: 1,000 foot view on the HST for part 2.

Disclaimer 1 : For this blog, I reuse some existing documents written by others. If you find some part being really similar to something you already read somewhere else, or that you even wrote it yourself, that is very well possible. If the information I used is not publicly available, I will not be able to reference to them. In general: I am guilty of plagiarism throughout all the upcoming blogs about Hippo architecture. The ideas are not mine, nor is most of the text. But I tried to ask all the authors if I could use their content and will try to give all the authors credits. Note that this first blog about architecture is mainly taken from a document written by Bart v/d Schans.

Disclaimer 2 : Don't see this blog as a single source of truth. When I write about architectural setups, I am typically having the most common setups in mind. However, many customers have domain specific requirements which result in different setups. Also, this series of blogs will be a representation on my view of our stack: our stack is customizable in many areas and most customers choose for domain specific solutions I do not even know about. This blog also does not explain for example upcoming replication setups from one cluster to another (using a DMZ or not).

Most of the blog item I will write about the Hippo CMS architectural view is about our open source stack. When I write or have architectural graphs about enterprise closed source parts, like relevance / targeting using couchbase, I will clearly indicate this.

The 10,000 foot view

Hippo CMS 7 architecture is a 3-Tier architecture consisting of a web-tier, an application-tier and a database-tier.

  1. The web-tier is a (set of) loadbalancer(s) that delegate requests to reverse proxies.
  2. The application-tier is where Hippo CMS 7 “lives”. Hippo CMS 7 is a pure java application that can be run in a variety of application servers.
  3. In the database-tier all the data gets persisted.

Each of these tiers can be scaled individually. The web-tier can exist of a number of load balancers and reverse proxies, the application tier can be clustered horizontally and the database can be either a single database, a cluster or some master/slave or master/master high availability solution.

The Request Flow

In a typical load balanced and clustered setup a request comes in at one of the load balancers. The load balancer will forward the request to one the the reverse proxies. The reverse proxy analyzes the request and chooses between:

  1. Serve the request from cache
  2. Proxy the request to either a Site Server
  3. Proxy the request to either a CMS Server

In a common setup, we serve binaries from a caching proxy, like squid, varnish or mod_cache, and serve every html / xml / json response live from the applications. The applications themselves cache on different levels as well, including an embedded page cache, see Caching below. There are also customers that have a cache in the form of a CDN in front of the loadbalancers. Hippo CMS 7 supports Edge Side Includes for CDNs and caching proxies as well.

Hippo CMS 7 Delivery Tier (HST site application) which is part of the Site Servers, have support for virtual hosting. This enables the HST site application to show different (sub) sites for requests from different domains / URLs. The virtual hosting configuration that takes care of this is stored in the repository and can be modified in production running environments. This enables you to add (sub)sites on the fly from blueprints without needing any change in the webservers (eg httpd config) or having to restart an application. Since the virtual hosting is done on application level, a typical Apache httpd (also see Configure Apache httpd web server for cms and site(s) ) config looks as follows:

In the above example, the CMS server(s) is/are accessed over cms.example.com, and the Site server(s) over www.example.com, but also over *.example.com due to the ServerAlias config. If instead of *.example.com the ServerAlias would just be *, then every request, that does not come from the domain cms.onehippo.org, arriving at this Apache webserver would be delegated to the Site server. Since Hippo delivery tier support virtual hosting, the delivery tier will handle all the different domains. More about this later.

Load balancing

There are no specific requirements for the load balancer except that it needs to be able to handle sticky sessions for the CMS servers. This means that once a HTTP session between a browser and an application server is established that all following requests will be sent to the same application server instance for that browser. Sticky session can be achieved in various ways: by analyzing the session cookies on the JSESSIONID, by inserting specials routing cookies or by analyzing the source IP address of the HTTP request.

The CMS is a stateful web application and requires sticky session.

The sites are preferably stateless, because it makes maintenance on one of the cluster nodes easier and the traffic can potentially be better divided among the cluster nodes. Developing a stateless web site requires a mindset from the developers to not write any code that relies on statefulness of the application. Obviously, this means they should be aware that for example using http sessions in the delivery tier makes the application stateful. Apart from that this results in the need for sticky sessions of the load balancer, it also reduces the possibility to use caching proxies or the internal delivery tier page caching. More about page caching later and also how you can use the page cache while still having some parts of the page 

So in general, we advise to enable sticky sessions for the CMS Server, and not for the Site Server.

5,000 foot view Site and CMS Server

Terminology

Hippo CMS 7 application stack is composed of three main components, namely the

  1. CMS editor/author/webmaster environment
  2. Delivery Tier (HST)
  3. Hippo Repository.

Whenever I talk about Hippo CMS 7, I refer to the entire stack. When I write CMS, I refer to only the CMS editor/author/webmaster environment. The delivery tier and Hippo Site Toolkit (HST) are synonyms and is the framework through which developers can implement their sites / rest-api. Hippo repository, a JCR compliant repository on top of Apache Jackrabbitforms together with the database the persistence layer. Typically, our Site Servers are deployed with a HST site application and a Hippo Repository application. CMS Servers in our default setup contain the CMS application, HST site application and Hippo Repository. Unless explicitly mentioned otherwise, this is how I assume the Site and CMS Server setup. 

In a typical standard setup, the Site Servers contain a site (HST) application and a Hippo Repository application and the CMS Servers contain the CMS application with an embedded Hippo Repository plus a site application. Zooming in on the 10,000 foot bird's eye view above for the Site and CMS Servers looks as follows for most our default setups:

Above, the green boxes are the web applications. The Site Server in the example above have a site webapp and a separate Repository webapp. The CMS Server contains the CMS webapp with an embedded repository and a separate Site webapp.

An alternative setup to the above one, is to let the site webapp also have an embedded repository. In that case, the setup becomes:

It does not really matter whether you use the first or the second setup above. That is up to a customers own preference. 

Two more other setup options are

  1. Do not deploy a Site app (HST) in the CMS Server
  2. Do not make a distinction between Site Server or CMS Servers : Just on all cluster nodes, deploy the entire distribution.

More about these setups in later blogs hopefully.

Clustering

Hippo CMS 7 can be clustered in a scale out or horizontal fashion. This means that if the load increases more application instances can be added to the cluster. The number of Site servers and CMS servers can be scaled independently. The part that all applications have in common is the Hippo Repository. The clustering is handled at the repository level. The clustering does not require an extra communication channel like JMS, JGroups or AMQP. All communication between the cluster instances is handled through the database. In summary:

  1. No clustering of the application servers itself is required.
  2. The repositories (application servers) connect to the same database(s).
  3. All the repositories are part of one cluster.
  4. All site and CMS nodes have an identical view of the repository.

The following diagram shows this flow:

Caching

There are four important default caches in a typical setup:

  1. Web caching: this is handled in the web tier either by the reverse proxy like Apache or by dedicated caching servers like varnish or squid. Typically this is used for static content like javascript files, stylesheets and images.
  2. Repository caching: this is called the BundleCache. The BundleCache caches content from the database so that it can quickly serve content without every time querying the database. The BundleCache is a memory cache. Its size can be configured and it is preferable to increase the size when the amount of content in the database increases. The size varies from 64MB for very small installations to 512MB for large ones. The following table gives an indication about the sizing for the BundleCache.
  3. The binaries cache: this cache is part of the delivery tier (HST) and caches binaries and assets like pdfs and images. The binaries cache can be configured to serve from memory (default) or from disk.
  4. The page cache: this cache is part of the delivery tier (HST) and caches rendered (hotspot) pages per URL space and/or page definition. It works in combination with targeting/relevance, and also contains a thundering herd protection. Developers need to take care not to rely on stateful (http session) logic that influences the rendered page if they want to use page caching. If they need session state, making the parts of the page needing this asynchronously loaded through asynchronous Ajax requests or Edge Side Includes (esi), can enable the page cache again.

1,000 foot view Delivery Tier

In the next blog about Hippo's architectural bird's eye view I will zoom in on the Delivery Tier (HST). How the request flows through the application, how the HST container Spring Components are assembled and how you can customize it, how the HST interacts with the repository, how link-rewriting (cross domain and cross channel) works, how to create your own REST API's (we don't only expose default REST responses but enable developers to easily build their own ones), and more. In that blog I will explain in more detail the graph below. Stay tuned for the next blog.

Get our Tech Paper

Understanding Hippo CMS Software Architecture

Read now