Liferay and Elastic Search integration

After talking about the Elastic Search main features, how to create an Elastic Search cluster, and the Liferay, Solr and Elastic Search comparison, here’s the most interesting part.

Liferay and Elastic Search integration

As far as I know no one has integrated Elastic Search and Liferay at the time this project started roughly one year ago. This was a challenging opportunity that touched on many areas of Liferay.

The Elastic Search web plugin for Liferay works as a plug between Liferay’s search portlets and Elastic Search server, as Solr web plugin for Liferay does. It is prepared to be used with Liferay 6.1 and Elastic Search 0.90.2 and newer versions, and it has been running for more than 6 months with no problems.

Important features:

1. Use of Jest to manage the Rest access to Elastic Search server.

I used the 0.0.4 version of Jest. The plugin uses a Jest client to manage access, that consists in a Singleton class that get the client by petition. The configuration of the client is the standard, it receives just the Url to connect to an Elastic Search server. The client is fully configurable, it supports, via portlet.properties, multithread control and node discovery.

## 
## Connection Parameters
##
 
#
# Connection URL
elasticsearch.connection.url = http://localhost:9200
 
#
# Disable factory use in order to use just one client with multithread
# The client has connection pool and uses memory efficiently 
#
elasticsearch.connection.factory.disabled = true
 
#
# Multithread values
# Multithread enabled
#
elasticsearch.connection.multithread.enabled = true
 
#
# Pool client max total connection
#
elasticsearch.connection.multithread.connection.maxTotal = 500
 
#
# Default pool client max total connection per route
#
elasticsearch.connection.multithread.connection.route.defaultMaxTotal = 100
 
#
# Enable node auto discovery
# Auto discovery is used in large Elastic Search clusters 
# or if it is needed zero downtime in case of change/close of any node.
# This feature consumes one thread for pinging and getting nodes information.
#
elasticsearch.connection.node.discovery.enabled = false
 
#
# Node discovery frequency
#
elasticsearch.connection.node.discovery.frequency = 5
 
#
# Node discovery timeunit
# Valid values: minutes, seconds, milliseconds
#
elasticsearch.connection.node.discovery.timeunit = seconds

2. Plugin installation and Indexing.

Once the portlet is deployed (it is a Liferay plugin, so copying the war into deploy folder is all that is needed), it works similar to Solr: an administrator has to reindex the portal, and then the document indexing is automatic. After installing the plugin, if a search is done without reindexing the server (from Liferay’s control panel), an error is launched by Liferay. This is the normal behavior.

As Elastic Search has some custom features, after several discussions, we decided to manage indexing from Liferay into Elastic Search with this premises:

a) Index name in Elastic Search is going to be the companyId. However, it is possible to define a prefix to the index, in case the Elastic Search server manages a lot of different indices, just to facilitate the identification of indices. This is configurable via portlet.properties:

##
## Index definition
##
 
#
# Put here the string you want to be attached to the Liferay's index name
# This is use in case there is a name match with existing indices
#
elasticsearch.index.prefix = liferay.

b) The type in Elastic Search is going to be the EntryClassName. In case there is no entryClassName in the document to index (that is the case, for example, of portlets), a custom type name is used; this custom type name is configurable via portlet.properties:

# The object type is defined, by default, by the entryClassName.
# Is it doesn't exist, the type default name is used
#
elasticsearch.type.default.name = 0

c) Indices can be automatically created if they dont exist, but this feature can be disabled in the server configuration, so the plugin have to know that.

#
# If the automatic index creation is disabled in the server, turn this false<
#
elasticsearch.index.automatic = false

3. Automatic creation of mappings and templates.

Of course when indexing, mappings and templates are really useful to define how the index is going to be managed. Although automatic mapping and template creation is possible, some features should be defined in order to get a proper plugin work.

a) Mappings

I used Search Schemer to get the mapping from Solr’s schema to Elastic Search. However, I didn’t see much difference from mapping via mappings (how could it be otherwise!) to delegate the automatic mapping to Elastic Search, once the Liferay Server is indexed. So I suggest the mapping option is optional (again).

##
## Mappings and templates
##

#
# If mapping is automatic (elasticsearch.mapping.automatic = true) 
# the mapping is NOT added from configuration files
# it is added automatically by Elastic Search
#
elasticsearch.mapping.automatic = true

#
# Specific mappings 
#
elasticsearch.mapping.specific.file[com.liferay.portal.model.Organization]=com/xtivia/portal/search/elasticsearch/resources/mapping/organization_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portal.model.User]=com/XTIVIA/portal/search/elasticsearch/resources/mapping/user_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.bookmarks.model.BookmarksEntry]=com/xtivia/portal/search/elasticsearch/resources/mapping/bookmark_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.blogs.model.BlogsEntry]=com/xtivia/portal/search/elasticsearch/resources/mapping/blog_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.calendar.model.CalEvent]=com/xtivia/portal/search/elasticsearch/resources/mapping/calendar_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.documentlibrary.model.DLFileEntry]=com/xtivia/portal/search/elasticsearch/resources/mapping/file_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.journal.model.JournalArticle]=com/xtivia/portal/search/elasticsearch/resources/mapping/article_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.messageboards.model.MBMessage]=com/xtivia/portal/search/elasticsearch/resources/mapping/message_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.softwarecatalog.model.SCProductEntry]=com/xtivia/portal/search/elasticsearch/resources/mapping/scproduct_mapping.json
elasticsearch.mapping.specific.file[com.liferay.portlet.wiki.model.WikiPage]=com/xtivia/portal/search/elasticsearch/resources/mapping/wiki_mapping.json

So it is possible to delegate the mapping into Elastic Search (I really recommend that if you use a template, it is a better way to avoid conflicts), but if the Elastic Search administrator wants to create specific mappings, it is possible. It should be defined one mapping per type, with the elasticsearch.mapping.specific.file[TYPE_NAME]=MAPPING_FILE property.

b) Templates

I suggest to use templating, by default, to define custom definitions (as multi language support with custom analyzers).

As with mappings, it is possible to enable automatic templating (recommended), and as with index names, it can be defined a prefix to the template name just to allow an easier identification in the Elastic Search Server:

#
# If true, the templates names defined with elasticsearch.template.names
# will have the prefix defined in elasticsearch.index.prefix
#
elasticsearch.template.with.prefix = true

#
# If templating is enabled (elasticsearch.template.enable = true)
# templates will be enabled when portlet is on 
#
elasticsearch.template.enable=false

When automatic templating is enabled, when the plugin is loaded, the defined templates are loaded into Elastic Search. This is the default behavior.

To define the templates that are going to be loaded, it is necessary to define the file that has the specific templates. It is possible to load more than one. In order to do that, the elasticsearch.template.names property is used. Here is an example:

#
# Templates names
# One name per template, the template path is defined in elasticsearch.template.name[template_name]
#
elasticsearch.template.names=general_template
elasticsearch.template.name[general_template]=com/xtivia/portal/search/elasticsearch/resources/templates/general_template.json

To use more than one template, the elasticsearch.template.names property can store different identifiers, separated by commas.

elasticsearch.template.names=NAME1,NAME2

The names NAME1 and NAME2 are used to identify the different properties that stores the path to the templates that are going to be loaded into Elastic Search Server:

elasticsearch.template.name[NAME1]=PATH_TO_TEMPLATE

I strongly recommend to define, at least, a default template as this one:

{

"template" : "liferay.*",
"mappings" : {
"_default_" : {
"dynamic_templates" : [{
"Entry Class Name" : {
"match": "entryClassName",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
}
}
}]
}
}
}

That is necessary to avoid a problem that I have experienced with Elastic Search when using Liferay’s faceted search, the Liferay’s object type (entryClassName) is managed with names as “com.liferay.portlet.journal.model.JournalArticle” and Elastic Search returns in searches “com.liferay.portlet.journal.model.journalarticle” as facet, so, as Liferay’s consider it case sensitive it doesn’t recognize it and facet is not presented. Other solutions would be to create a hook in Liferay (just modifying a single code line, using .equalsIgnoreCase() comparison) or modifying the core code of Liferay, but this solution is less intrusive.

Regarding to analyzers, by default Liferay uses the standard analyzer and pattern analyzer, but in some fields it is possible to define custom analyzers, that should be defined in templates. Use the table below as reference:

*Field*	*Liferay analyzer Class*	*ES Analyzer*
assetCategoryTitles*	com.liferay.portal .search.lucene.LikeKeywordAnalyzer	–
assetTagNames	com.liferay.portal .search.lucene.LikeKeywordAnalyzer	–
entryClassName	org.apache.lucene .analysis.KeywordAnalyzer	keyword
extension	org.apache.lucene .analysis.KeywordAnalyzer	keyword
installedVersion	org.apache.lucene .analysis.KeywordAnalyzer	keyword
layoutUuid	org.apache.lucene .analysis.KeywordAnalyzer	keyword
license	com.liferay.portal.search .lucene.LikeKeywordAnalyzer	–
path	org.apache.lucene .analysis.KeywordAnalyzer	keyword
status	org.apache.lucene .analysis.KeywordAnalyzer	keyword
structureId	org.apache.lucene .analysis.KeywordAnalyzer	keyword
tag	com.liferay.portal .search.lucene.LikeKeywordAnalyzer	–
templateId	org.apache.lucene .analysis.KeywordAnalyzer	keyword
treePath	com.liferay.portal .search.lucene.LikeKeywordAnalyzer	–
type	org.apache.lucene .analysis.KeywordAnalyzer	keyword
userName	com.liferay.portal.search .lucene.LikeKeywordAnalyzer	–
*_ar	org.apache.lucene .analysis.ar.ArabicAnalyzer	arabic
*_de_DE	org.apache.lucene .analysis.de.GermanAnalyzer	german
*_el_GR	org.apache.lucene .analysis.el.GreekAnalyzer	greek
*_fa_IR	org.apache.lucene .analysis.fa.PersianAnalyzer	persian
*_fr_[A-Z]{2}	org.apache.lucene .analysis.fr.FrenchAnalyzer	french
*_ja_JP	org.apache.lucene .analysis.cjk.CJKAnalyzer	cjk
*_ko_KR	org.apache.lucene .analysis.cjk.CJKAnalyzer	cjk
*_nl_NL	org.apache.lucene .analysis.nl.DutchAnalyzer	dutch
*_pt_BR	org.apache.lucene .analysis.br.BrazilianAnalyzer	brazilian
*_ru_RU	org.apache.lucene .analysis.ru.RussianAnalyzer	russian
*_zh_CN	org.apache.lucene .analysis.cjk.CJKAnalyzer	cjk
*_zh_TW	org.apache.lucene .analysis.cjk.CJKAnalyzer	cjk

4. Searching

Of course, THIS is the main feature! Searching includes sorts, facets, highlights or pagination, among other features. The plugin has been designed to support all features supported by Liferay. See Liferay’s Faceted Search Blog entry for more information.

5. Others

There are some known issues at this moment, due to the particularities of Elastic Search and Liferay. For example, sorting in unmapped fields throws an error. For Elastic Search, I added the option to ignore unmapped fields with a property in portlet.properties (elasticsearch.sort.ignore.unmapped field has to be true). If the behavior changes, we will be able to just configure it.

There is also a challenge with certain Indexers and Elastic Search due to UID definition. The affected indexers are PluginPackageIndexer and WikiIndexer, they accept some characters – for example (, ) or / – that Elastic Search doesn’t manage. This could be avoided with IndexerPostProcessors; as different search engines will have different characteristics, Liferay doesn’t manage them all (and it is not supposed to do!).

Conclusion

So far the integration with Elastic Search has been a really good experience: I discovered that Elastic Search is a powerful search engine, allows robust integration with the data and developed integrations, as it is open source and easy to use; the use of Jest client allowed to encapsulate all server nodes management and connection (next step will be to encapsulate the Jest use so it will be easily replaceable by other server connection management solutions) and as it is open source I was able to contribute with a little help when needed. The integration with Liferay’s faceted search has been totally successful, with really good performance!

The next question is: how do I integrate Liferay with other sources and Elastic Search? Well, we can apply different solutions on this, as there are many different options; it depends on your requirements! But this deserves another blog post…

And that’s all by the moment!! If you have any question, need help or want to give me your impressions, just comment at the bottom of this post or contact XTIVIA!! We are glad to help you!

Liferay and Elastic Search integration

Liferay and Elastic Search integration

1. Use of Jest to manage the Rest access to Elastic Search server.

2. Plugin installation and Indexing.

4. Searching

Submit a Comment Cancel reply

Follow Us

Need more information? Let’s Talk Today!

Hear From Our Customers

Learn More About

Categories

Recent Posts

Liferay and Elastic Search integration

Liferay and Elastic Search integration

1. Use of Jest to manage the Rest access to Elastic Search server.

2. Plugin installation and Indexing.

4. Searching

Submit a Comment Cancel reply

Follow Us

Need more information? Let’s Talk Today!

Hear From Our Customers

Learn More About

Categories

Recent Posts

Tags