Using Pelican and Tipue Search for a (Client-Side) Search Function in a Static Website

As I'm using a static website generator, in my case Pelican, I'm very limited on the services I can bring to my readers. Every content and page needs to be generated beforehand, no calculations are possible on the server, when the client decides to read an article. So server-side search engines are not possible. Therefore a good way is to use a client-side search engine in JavaScript, for example Tipue Search, which is open source and free to use.

The way Tipue works is by loading essentially all the content of your website as JSON into memory and searching on it in JavaScript with a search term, the reader sets, e.g. she types something in a search bar. The Pelican Plugin tipue_search allows generating such a file, which loads all the contents of your website into a variable in JavaScript, where Tipue then searches on.

I've just noticied, that I got an older version (5.0 from 2015). I'm assuming this article can still be used for deploying a search function with a more recent version of Tipue Search. The current version is 7.1 [1].

Addendum: I've just merged my changes manually into the current 7.1 version. Apart from a few peculiarities, like the related-section behaving oddly, I didn't find any problems (after a decent amount of swearing of course).

Installation

Pelican Plugin

The Pelican Plugin can be found in the Pelican Plugins Repository [1]. Installing these plugins is quite easy:

  1. Clone the Repository somewhere into your pelican-folder, e.g. to pelican-plugins
  2. Append your PLUGINS variable in your pelicanconf.py with the wanted plugin
  3. Install the prerequisites of the tipue_search plugin

Cloning the repository is done with git, make sure to recursively clone the repo, as there are submodules enabled in this repo:

git clone --recursive https://github.com/getpelican/pelican-plugins

Then update your pelicanconf.py to denote the directory(ies) of your plugins and denote every plugin you want to use. I'm using better_codeblock_line_numbering, css-html-js-minify and tipue_search.

PLUGIN_PATHS = [ "pelican-plugins" ]
PLUGINS = [ "better_codeblock_line_numbering", 'css-html-js-minify', 'tipue_search' ]
TIPUE_SEARCH = True
TEMPLATE_PAGES = {
    'search.html': 'search.html',
}

Then you only need to install the prerequisites of your plugins, in the case of tipue_search it is the Python-module beautifulsoup4. Make sure to install it for the same Python version your Pelican installation uses (you may want to use pip3 on your distribution).

pip install beautifulsoup4

Then you should be good to go. When you run Pelican (for example by using the Makefile), you will find a tipuesearch_content.js in your output-directory.

The JavaScript part

  1. Create a new template for your search page
  2. Download and include tipue search and jQuery if you havent

Now create a new template search.html. For starters it should feature a search bar and an area for the results. Make sure you include the Tipue Search-Javascript files and your content-file. You will also need jQuery in your website, as Tipue Search is based on jQuery (but that requirement should be removable quite easily with some JavaScript knowledge).

{% extends "base.html" %}

{% block title %}
  Durchsuche {{ super() }}
{% endblock title %}

{% block extra_head %}
  <link rel="stylesheet" type="text/css" href="/static/css/tipuesearch.css">
  <script src="/tipuesearch_content.js"></script>
  <script src="/static/js/tipuesearch_set.js"></script>
  <script src="/static/js/tipuesearch.min.js"></script>

  <script>
    $(document).ready(function() {
      $('#tipue_search_input').tipuesearch({
          'show': 10,               // shows 10 found entries
          'descriptiveWords': 75,   // shows 75 words of teaser text
          'newWindow': false,       // don't open a new window, when clicking a result
      });
    });
  </script>
{% endblock %}

{% block content %}
<h1>Durchsuchen</h1>
<form action="search.html">
    <div class="tipue_search_group">
        <input type="text" name="q" id="tipue_search_input">
    </div>
</form>
<div>
    <div id="tipue_search_content"></div>
</div>
{% endblock content %}

You will at least need the tipue_search_input-input as your search bar and the tipue_search_content-div as the area where to place the results. The default is that a URI-parameter is used as input for te search script. The tipuesearch_set.js contains some strings and skipped words, you may want to fill out. It's in English and you may want to translate it to your primary language.

My Changes

I've modified both, the pelican plugin as well as the JavaScript and CSS stuff, as I wasn't happy with the default behaviour.

Pelican Plugin

I've modified the behaviour of the Pelican Plugin a bit. For one I wasn't satisfied with the way it handled categories (not at all), it didn't include thumbnails or dates, and called tags categories. Therefore I changed the create_json_node function to the following:

    def create_json_node(self, page):
        if getattr(page, 'status', 'published') != 'published':
            return

        soup_title = BeautifulSoup(page.title.replace('&nbsp;', ' '), 'html.parser')
        page_title = soup_title.get_text(' ', strip=True).replace('“', '"')
        page_title = page_title.replace('”', '"').replace('’', "'")
        page_title = page_title.replace('^', '&#94;')

        soup_text = BeautifulSoup(page.content, 'html.parser')
        page_text = soup_text.get_text(' ', strip=True).replace('“', '"')
        page_text = page_text.replace('”', '"').replace('’', "'")
        page_text = page_text.replace('¶', ' ').replace('^', '&#94;')
        page_text = ' '.join(page_text.split())

        page_category = page.category.name if getattr(page, 'category', 'None') != 'None' else ''

        tags = page.tags if getattr(page, 'tags', 'None') != 'None' else ''
        page_tags = "";
        for t in tags:
            page_tags += t.name + ",";
        page_tags = page_tags[:-1]

        page_thumbnail = "/data/thumbnail/" + 
            page.thumbnail + "_large.jpg" if getattr(page, 'thumbnail', 'None' ) != 'None' else ''

        page_date = page.date.strftime("%d.%m.%Y")

        page_url = '.'
        if page.url:
            if "pages/" in page.url:
                page_url = "/" + page.url
            else:
                page_url = "/" + page_category.lower() + "/" + page.url

        node = {'title': page_title,
                'text': page_text,
                'category': page_category,
                'tags': page_tags,
                'url': page_url,
                'thumbnail': page_thumbnail,
                'date': page_date,
                'loc': page_url} 
        self.json_nodes.append(node)

I still am thinking about changing the page_text handling as well. I'm not happy with all my blog entries in one JSON-readable format. Also this could make problems on slower internet connections for huge blogs.

Troubles with css-html-js-minify

I found, that using the minification plugin css-html-js-minify results in an unreadable tipuesearch_content.js-file. So I've also altered the output-path and move the file in my Makefile after pelican has run and has created by website.

JavaScript

While I left the search logic completely intact for the time being. I did however remove options and functions I didn't intend on using. Also I modified the output. In a single file, which holds about 600 lines of JavaScript code, Tipue searches in the content-area, sorts the results according to a score and outputs the resulting HTML-code to a variable out.

The behaviour is configurable by setting options on whether some things should be displayed or not. However the order cannot be altered. Also the default pagination is a the bottom, I wanted it to be placed on both sides. These are the options (on the older 5.0 version). They can be found in tipuesearch.js on the very top.

var set = $.extend( {
    'show'                   : 7,
    'newWindow'              : false,
    'showURL'                : true,
    'showTitleCount'         : true,
    'minimumLength'          : 3,
    'descriptiveWords'       : 25,
    'highlightTerms'         : true,
    'highlightEveryTerm'     : false,
    'mode'                   : 'static',
    'liveDescription'        : '*',
    'liveContent'            : '*',
    'contentLocation'        : 'tipuesearch/tipuesearch_content.json',
    'debug'                  : false          
}, options);

This is my altered output-part tipuesearch.js:

// last unaltered line:
found.sort(function(a, b) { return b.score - a.score } );

var pagination = "";
// is the count (c) of the found-array is bigger then the configured number of shown results
// -> generate pagination
if (c > set.show) {
    var pages = Math.ceil(c / set.show);
    var page = (start / set.show);
    pagination += '<div id="pagination" style="text-align:center;margin-bottom: 10px">';

    if (start > 0) {
        pagination += '<span class="pagination_element prev">\
            <a data-id="' + (start - set.show) + '_' + replace + '">«</a></span>'; 
    } else {
        pagination += '<span class="pagination_element_disabled">«</span>'; 
    }

    if (page <= 2) {
        var p_b = pages;
        if (pages > 3) {
            p_b = 3;
        }
        for (var f = 0; f < p_b; f++) {
            if (f == page) {
                pagination += '<span class="pagination_element pagination_element_active">'+(f+1)+'</span>';
            } else {
                pagination += '<span class="pagination_element">\
                    <a data-id="'+(f*set.show)+'_'+replace+'">'+(f+1)+'</a></span>';
            }
        }
    } else {
        var p_b = page + 2;
        if (p_b > pages) {
            p_b = pages; 
        }
        for (var f = page - 1; f < p_b; f++) {
            if (f == page) {
                pagination += '<span class="pagination_element pagination_element_active">'+(f+1)+'</span>';
            } else {
                pagination += '<span>\
                    <a class="tipue_search_foot_box" data-id="'+(f*set.show)+'_'+replace+'">'+(f+1)+'</a>\
                </span>';
            }
        }
    }

    if (page + 1 != pages) {
        pagination += '<span class="next pagination_element">\
            <a data-id="' + (start + set.show) + '_' + replace + '">»</a></span>'; 
    } else {
        pagination += '<span class="next pagination_element_disabled">»</span>'; 
    }

    pagination += '</div>';
}

// add a top pagination bit to the output
out += pagination;

// iterate over the sorted found-array
var l_o = 0;
for (var i = 0; i < found.length; i++) {
    if (l_o >= start && l_o < set.show + start) {
        var t = found[i].desc;
        var t_d = '';
        var t_w = t.split(' ');
        if (t_w.length < set.descriptiveWords) {
            t_d = t;
        } else {
            for (var f = 0; f < set.descriptiveWords; f++) {
                t_d += t_w[f] + ' ';    
            }
        }
        t_d = $.trim(t_d);
        if (t_d.charAt(t_d.length - 1) != '.') {
            t_d += ' ...';
        }
        f = found[i]
        out +='<article class="post-preview"><div class="post-inner post-hover"><div class="post-thumbnail">'
        if ( f.thumbnail.length > 0 ) {
            out += '<a href="'+f.url+'" title="'+f.title+'" class="hover_img">\
                <div style="background-image:url(\''+f.thumbnail+'\')" class="thumb-medium"></div></a></div>'
        }

        if ( f.category.length > 0 ) {
            out += '<div class="post-meta group"><span class="post-category">\
                <a href="/category/'+f.category+'.html" rel="category tag">'+f.category+'</a>\
            </span><span class="post-date"><time class="published updated">'+f.date+'</time></span></div>';
        }

        out += '<a href="'+f.url+'" rel="bookmark" title="'+f.title+'">\
            <h2 class="post-title entry-title">'+f.title+'</h2></a>\
            <div class="entry excerpt entry-summary">'+t_d+'\
            <div class="readmore"><a href="'+f.url+'" title="'+f.title+'">[weiter lesen]</a>\
            </div></div></div></article>'
    }
    l_o++;     
}

// add a bottom pagination bit
out += pagination;

Conclusion

With a few more modifcations to the way Tipue Search works, e.g. binding it to keyup on the search-bar instead of relying on a q-parameter to the website, makes it quite usable. A few flaws and bugs will have creeped through, but I have now a working search engine in my statically generated blog.

As a reference I'll leave the plugin and my JavaScript files here. Please use them with a grain of salt - it is way safer to just use the upstream version, especially when you are finding this article in a few years!

References

[1] https://github.com/getpelican/pelican-plugins
[2] https://tipue.com/search/

Last edit: 04.05.2020 21:00