RUA

I use next.js and mdx plugin to build my blog site. It's a next.js SSG project.

Also it's a JAMStack site. So i need a extenal search engine.

The Algolia is my first choice. We can build our own Algolia front UI, or use DocSearch

GitHub - algolia/docsearch: :blue_book: The easiest way to add search to your documentation.

:blue_book: The easiest way to add search to your documentation. - algolia/docsearch

https://github.com/algolia/docsearch

Purpose

Algolia split DocSearch into to parts:

A cralwer to crawl our sites.
A frontend UI liburary to show search result.

In legacy edition, Algolia provide a docsearch-scraper to build our own crawler.

Although it's still can plug it to DocSearch v3. But now it's deprecated.

They introduct the Algolia Crawler web interface to manage the crawler.

Crawler Admin Console

https://crawler.algolia.com/admin/users/login

But i can't login with my Algolia account.

Can't login to Algolia Crawler

So i need find another way to generate my post index.

Index format

The DocSearch frontend UI read result as specific format. We just need to provide the same format to DocSearch.

Then DocSearch fronted UI can works.

Index format

So we need post same format to Algolia.

Push our data

Algolia provide JavaScript API Client to push data to Algolia.

yarn add algoliasearch

npm install algoliasearch

The client will help us push data to Algolia. We just need to prepare out data.

Docsearch format

Because Docsearch read result as specific format. our data need to be like this:

{
  content: null,
  hierarchy: {
    lvl0: 'Post',
    lvl1: slug,
    lvl2: heading,
  },
  type: 'lvl2',
  objectID: 'id',
  url: 'url',
}

Generate format

For generate our data, we need:

dotenv: read Algolia app ID and admin key in .env file.
algoliasearch: JavaScript API client.
fs and path: read post file.
nanoid (optional): generate unique objectID.

For use ECMAScript import, we need set file suffix with .mjs. The node.js can use import statement.

// build-search.mjs

import { config } from 'dotenv';
import algoliasearch from 'algoliasearch/lite.js';
import fs from 'fs';
import path from 'path';
import { nanoid } from 'nanoid';

Next, read post content from file. First we need read whole content from the file:

const files = fs.readdirSync(path.join('pages/p'));

Then, prepare a empty array to store post data. And traverse content to generate format we need.

const myPosts = [];
files.map((f) => {
  const content = fs.readFileSync(path.join('pages/p', f), 'utf-8');
  // const { content: meta, content } = matter(markdownWithMeta);

  const slug = f.replace(/\.mdx$/, '');
  const regex = /^#{2}(?!#)(.*)/gm;

  content.match(regex)?.map((h) => {
    const heading = h.substring(3);

    myPosts.push({
      content: null,
      hierarchy: {
        lvl0: 'Post',
        lvl1: slug,
        lvl2: heading,
      },
      type: 'lvl2',
      objectID: `${nanoid()}-https://rua.plus/p/${slug}`,
      url: `https://rua.plus/p/${slug}#${heading
        .toLocaleLowerCase()
        .replace(/ /g, '-')}`,
    });
  });

The type property means level of table of contents.

I just need h2 title in search result. So just match them with /^#{2}(?!#)(.*)/gm.

And post title is the lvl1 type:

myPosts.push({
  content: null,
  hierarchy: {
    lvl0: 'Post',
    lvl1: slug,
  },
  type: 'lvl1',
  objectID: `${nanoid()}-https://rua.plus/p/${slug}`,
  url: `https://rua.plus/p/${slug}`,
});

Push to Algolia

Algolia API is easy to use. First we need specify the index name.

const index = client.initIndex('rua');

And save the objects.

const algoliaResponse = await index.replaceAllObjects(posts);

All done!

Setting up DocSearch for next.js

Purpose

Index format

Push our data

Docsearch format

Generate format

Push to Algolia