Choosing best Stemmer for your Solr Collection

Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. We use different filters in Solr to apply stemming. Each stemmer differs in number of scenarios it can cover. For one of my project we have tried to create a matrix to make decision. It can help you to take decision.

Continue reading

Advertisements

Nested JSON Objects with Solr

We use Solr for storing different types structured data. Solr works fine and feels intuitive to use as long as structured entity has all properties of basic types like string, number, date etc. But the moment we like to index an entity with relations (which is quite common), intuitiveness of the response will need to be compromised with. Some teams have different strategy to take care of this. We have tried different approaches and settled with a custom response writer along with a naming convention in schema. Yes, those who has to work with dynamic schema or schemaless, following wont help.

Continue reading

Fix : Issue with getting slide title using apache POI

Indexing office files is one of the common case while developing search applications. In my case I needed to index slides of a presentation in which title and content needs to indexed separately as we need to provide high boosting for title. But while extracting title for slides using XSLFSlide objects method getTitle(). Title is not getting returned for many slides.

Continue reading

Solr : Same configuration files for master/slave for every environment

A solr core needs so many configuration files. They look different for master and slave. Also some files like data-config.xml are different for environments when you have data sources to index from. With these permutations, you might end up with large sets of config files. Here I have explained how I managed maintain only single set for all servers and all environments.
Continue reading