Resource History Concept
Work in progress
This document is incomplete and INTERNAL
1. Resources in unblu
In unblu, resources represent pieces of data which are (usually) not directly contained in a visual but referenced from it (using a URI). Typical resource types are binary resources like images, documents (pdf, doc, etc.) or multimedia content (videos, audio, etc.). Besides those, also textual resources - e.g. a stylesheet (css) file - and theoretically even html itself can be resources.
There are some exceptions to this rule:
The following are typically directly contained in a visual and then transformed into resources on unblu server side:
- styles in html attributes
- styles in
2. Resource storage and access
An unblu resource consists of two parts:
- The resource itself
- The data contained - called a blob (binary large object)
2.1. The "Resource" object
The resource object provides these informations:
- uuid representing the resource
- uri of the original resource
- mime type
- charset (if textual resource)
- state (can be
- origin (can be
- reference to a blob containing the actual data of the resource
2.2. The "blob" object
A "blob" contains the actual data of the resource. There are two kinds of blobs:
- basic blob (or just blob)
- typed blob (based on a basic blob)
In addition there exists a "dummy" blob, which is marked with a specific id:
The purpose of this blob is to indicate, that there is "no data available" - is used in resources, where the blob is not available (yet).
2.2.1. Basic blob
Informations provided by basic blobs:
- checksum (currently a CRC32 checksum)
- length (in bytes)
- creation date
- binary data
2.2.2. Typed blob
Informations provided by typed blobs:
- same as blob plus:
- mime type
"Resource storage" is split into three areas:
- Resource store
- Blob store
- Resource table
The first two refer to their respective objects (see above). The resource table is the resolution table when looking for resources with a known backend uri (but not an uuid).
By default, all three stores are located in memory. Resource and resource table store have session scope - that is, their content is "dropped" when the unblu session ends. Blob store on the other hand is global - thus same resources in multiple sessions are stored only once in memory.
2.4. Resource request URI
When resource history is turned off, uri's arriving in a player directly point to the original image / element of the original backend webserver. Such uri's are directly requested by the player.
On the other hand, when resource history is turned on, visuals arriving in a player do not contain uri's pointing to the original backend webserver. Instead, the uri's are converted to a specific format pointing to the unblu server.
Resource uri with resource history turned on:
Note that the resource-uuid is never sent to the unblu server when the resource is requested from a player browser, since the so called "fragment" is only used in the browser. The reason to have it in our uri's is, that of course it is appended e.g. in a uri contained in a css. Thus, if the css is parsed on the server, the resource uuid can be extracted and the according informations (especially about inbound / outbound references) retrieved and processed. As soon as the css arrives on a player's browser, the browser will request the css but without the resource uuid - this is not required in fact.
The fact, that it is actually the blob being retrieved and not the surrounding resource is important. Since blobs are stored only once, the player also only has to retrieve them once (and then have them locally cached). If there are many resources (e.g. many uri's) with the same content (blob) then caching is extremely effective. In extremis it is possible that the player is loading a webpage faster than the recorder originally. This can happen, if the original webpage contains hundreds of images with differing uris but always the same data. In the player, all of those resources would have the same blob, resulting in the same uri and thus would be retrieved only once.
2.5. Resource history
In unblu, resources are handled fundamentally different if a feature called "resource history" is either turned on or off. The purpose of the resource history (when turned on) is to store all resources visible at the recorder.
There are two main reasons for doing this:
- Check / validate / filter resources that will be displayed on the player
- Store the resource in the context of the session in order to have the resource available for later "re-play" - that is: replaying a session after it has been closed.
2.5.1. How to activate resource history
In your unblu properties file, add the following lines:
2.6. Resource flow
The following two diagrams show the flow of resources depending on whether the resource history is turned on or off.
2.6.1. Without resource history
Without resource history, resources uri's in visuals are left as-is and transferred unchanged from recorder to player. Thus, the resources themselve are not transferred to the unblu server nor are they stored somewhere. Instead, the player will request the resource directly on the original backend webserver on demand.
Note, that even though the uri's remain as in the recorder, they are checked on the unblu server and only left through, if they correctly point to a resource on the backend webserver. Thus, it is not possible to "inject" a link to some obscure resource on some unknown webserver or at least, this resource will not be requested by the player's browser.
This is the fastest way to access resources.
2.6.2. With resource history
With resource history turned on, the unblu player only interacts with the unblu server. There is nothing requested from anywhere else. All resources thus must be uploaded to the unblu server once they are discovered in a visual (e.g. as image or stylesheet link tag).
This configuration provides the maximum amount of security, since all resources have to be transferred from the backend webserver to the unblu server. If the recorder's browser sends a link to some other file, the player's browser will not be able to resolve that resource - that is, it will request the resource from the unblu server which has no such resource and returns a 404 not found.
Note that browser caching is a problem for this approach. Typically, resources are transferred to unblu e.g. from a reverse proxy where all data to the end user is flying by. Ususally, traffic is only monitored by unblu once an unblu session has been started. That means, that it may well happen, that an end user surfs on the website, retrieves images and stylesheets and has them stored in his browser cache from that moment on. Once the unblu session starts, the resources no longer fly by the reverse proxy and thus don't get transferred to unblu. Unblu will detect this and send the recorder commands to re-request such files. However, this behaviour may take some time and thus, the performance is typically much worse with resourcehistory enabled than without.
Resource history is required, if you want a session to be stored for later re-play (unblu re:play product - currently not available but planned).
2.7. Resource processing
Depending on the resource type, a resource will be "processed" prior to being used or it is used in it's raw / original form. A typical (and currently the only) processor is the CSS processor.
2.7.1. CSS processor
The main purpose of the CSS processor is to identify URI reference locations and convert them to unblu resource URIs. Unblu server features two kinds of CSS processors:
- Full CSS parser based
- "simple" regex based
The Full CSS parser based is used, when resource history is turned on. Basically it simulates a CSS parser as present in browsers and filters the CSS. The full CSS parser thus not only scans for URIs, in addition it also drops unknown / unsafe / problematic CSS rules.
The "simple" regex based css processor only scans for URIs and verifies / replaces them.
2.7.2. Dependency Processing
Resources can have incoming (inbound) or outgoing (outbound) references. A CSS resource for example can have outbound references to background images. On the other hand it can have inbound references to html files or other CSS resources.
The following illustration shows an example dependency tree. Any arrow going away is an outbound reference, whereas an arrow going "in" is an inbound one.
These references have an impact once a resource changes. Resources can have the following states:
PENDING(a resource with a certain uri has been seen e.g. in a visual but is not present in the resource store. It is expected to arrive on the unblu server later on)
REQUESTING(a resource is supposed to be arrived on the server but didn't arrive so far. It is being requested from the unblu client again)
REQUESTED(a missing resource has been requested)
INVALID(a missing resource is invalid - e.g. not available 404 or similar)
MATERIALIZED(a resource' content has arrived but may need processing)
DELIVERABLE) (a resource' content has arrived and has been processed. It is ready for delivery)
If a resource changes its state, e.g. from
DELIVERABLE, it typically includes changes in its data (that is, the contained blob has changed). If the blob changes, the URIs in Visuals and possible depending resources like CSS files need to be updated. Thus, a status update always leads to a cascade of resource updates which, in the simplest case, means there is nothing to do (no dependencies) or, e.g. in the above illustration, if a picture changes, it can lead to reprocessing of the imported.css, default.css as well as the part of the visual where the default.css was included into html.