Full-text search in Unblu

Installation options

You can run the search engine for Unblu Spark yourself or use a cloud service provider to run it for you as a service.

If you use the Unblu Cloud, Unblu configure the search engine for you.
AWS, Azure, and Google Cloud all offer services to run Elasticsearch and/or OpenSearch. Alternatively, you can deploy a search engine using a Kubernetes operator.

For more information, refer to your cloud service provider’s documentation.
If you run Unblu Spark on-premises, you can deploy a search engine in your Kubernetes cluster. For more information, refer to the documentation of your search engine of choice.

Indexing

Unblu Spark uses separate indices for each Unblu account. You thus have one index for conversations for each account running on a deployment of Unblu Spark.

Once you’ve created an index, Unblu Spark sends new messages in ongoing conversations to the search engine when it adds them to the Unblu database. Besides the message itself, Unblu also sends conversation metadata and context person labels to the search engine. If any of this information changes, all the conversation’s messages in the search engine are updated to reflect the change immediately.

Despite other data being sent to the search engine during indexing, only the text of the message itself is searchable.

Indexed messages

When indexing a conversation, all the messages in the conversation are added to the index as documents. This includes not just chat messages but system messages as well.

Reindexing

Reindexing takes place when triggered. It runs as a batch process, executed individually for each account. The batch process creates a new index and add all the items to the new index.

During reindexing, the old index is still available for searches, and new messages are added to both the old and the new index, so full-text search remains fully functional.

By default, reindexing isn’t triggered automatically. You can, however, configure Unblu to reindex messages automatically if the index mappings are out of date or incompatible. For more information, refer to Configuring Unblu full-text search.

Only users with the SUPER_ADMIN role can trigger reindexing.

Document schemas

Each index uses a document schema that describes the individual fields of the documents it contains. This allows to define different indexing behavior for different fields. For example, the search engine might use stemming on the message content but not on the conversation ID.

As Unblu evolves, so should the document schema. With this in mind, each schema is assigned a semantic version number.

When you launch Unblu, the Collaboration Server compares the version numbers of the schema used by the indices in the search engine and the product schema, that is, the schema specified in the Collaboration Server. The outcome of the check is one of the following:

UP_TO_DATE: The product schema and the schema in use have the same version number.
COMPATIBLE: The product schema and the schema in use differ in the patch version number. Unblu can both read from and write to the index. You should, however, consider reindexing.
SHOULD_REINDEX: The product schema and the schema in use differ in the minor version number. Unblu can read from the index but can’t write to it, so new messages won’t be added to the index. Full-text search will be available for old messages.
MUST_REINDEX: The product schema and the schema in use differ in the major version number. Unblu can neither read from nor write to the index. Full-text search will not be available.

The result of the check is stored in the Unblu database along with the version of Unblu Spark.

The check is carried out for the first instance of Unblu Spark launched in a cluster.

Data synchronicity between Unblu Spark and the search engine

There’s no straightforward way to ascertain whether the data in the Unblu database and in the search engine index are out of sync with one another.

If you discover discrepancies between the two data sets, you must reindex the data in the databsase.

Filtering search results

When a users searches through chat messages, the search engine returns all the messages that match the term the user searched for. These results may include messages that the user isn’t authorized to view. The Collaboration Server therefore runs a process referred to as "post-filtering" to include only messages that the user is entitled to see.

As a result of post-filtering, the number of results actually displayed to the user can differ from the number of results the search engine returned. In the most extreme case, the user may not be allowed to view any of the results returned in one batch. This has a number of consequences:

Post-filtering breaks any pagination that the search engine provides. To work around this, the Unblu UI uses infinite scrolling.
To determine the number of results, Unblu would have to retrieve all of the matching messages, post-filter them, and then count the messages that remain.
If the worst case arises and the user isn’t allowed to view any of the message returned in a set of search results, the user must nevertheless actively scroll to the end of the search results. This triggers the retrieval of the next set of results.

If the user isn’t allowed to see any of the messages in the next set of results, they must actively scroll to the end of the results again to trigger the next call to the search engine.

Search function availability

At present, full-text search across multiple conversations is only available in the Agent Desk. The web API, JavaScript APIs, and mobile SDKs don’t support full-text search.