Build a search engine with strus

What is strus ?

The project strus is a collection of libraries and tools written in C++ to build a competitive search engine. Currently it is a single person project that started in September 2014 and therefore the competitiveness in terms of features of the software is more a promise than a fact. It definitely needs more brain to be put into it to catch up with the big players for open source search engines Lucene, Xapian and Sphinx. But strus is not only a me-too-project for search.

  • Strus introduces expression matching and information extraction on a different level than other known open source engines (read more…).
  • Strus simplifies the architecture of a search engine by “outsourcing” of components like the key/value store database storing the data blocks. This componentization (see components of strus) reduces the amount of code drastically and it raises opportunities for experts on a specific topic to contribute (read more…). Strus is not the first attempt to try that, but it is the first attempt as open source project, that has a performance within reach of the big open source search engines. And it does that without a 10 years history of optimization in the back. Strus might not be there at eye level, but let’s see what happens, if more different reasoning and competition is put into it.

For who is strus ?

People I would primarily like to address with this blog are developers or hackers as potential contributors or for feedback. On the other hand the project could already be interesting for experimental projects that can afford to go along with the development of strus. As stakeholder you can influence the project too. As the Wikipedia demo project shows, it is already possible to build projects, but you have to be aware, that dead lines should not exist, because you might hit a point where a feature you need is not instantaneously available. Project planning gets difficult at the current stage. Furthermore the state of documentation is still quite poor.

Programming paradigms

All interfaces of strus are pure. No inheritance is used in the main header files. Strus is more a lego thing than a provider of solution classes. If you want for example to build a sequence of terms as feature for your search, you have to build its expression tree with help of a stack, rather than picking a class that implements a sequence query. In PHP this looks as follows:

$terms = [ “hello”,  “world” ];
$query->pushTerm( “word”/*feature type*/, $term[0] );
$query->pushTerm( “word”/*feature type*/, $term[1] );
$query->pushExpression( “sequence”, 2/*nof terms*/, 2/*position range*/);
$query->defineFeature( “docfeat” /*name addressing this feature set*/);

The number of interface classes is small (see for example the interface classes of the core), but you have to understand them. If you want to contribute, you should also have a closer look at the  programing guidelines.

Try it !

There exist a guide how to fetch, build and install strus. Unfortunately a tutorial is still missing. There will be one soon !

Support

I will reply to questions. Please mail me to contact at project dash strus dot net.

Thanks

I want to thank the authors of LevelDB here. I was looking for some time for a key/value store database that had an upper bound seek function in the interface. The upper bound seek is crucial because it allows you to minimize block accesses on disk when joining sets. A key/value store without upper bound seek would have forced me to create virtual blocks that point to other blocks. This would mean more disk accesses to fetch the data blocks needed. LevelDB has it. Any other alternative candidate to implement the database interface has to have it too.

Social Media

Github: patrickfrey

Twitter: @ProjectStrus

Leave a comment