# Overview and Backstory - I bought a copy of a website, `newadvent.org`, which is full of great stuff. - I'm not allowed to share the website. Go buy a copy or use https://newadvent.org/ - One thing I wanted to have with the website was my own search function. - New Advent uses google as a search engine on their website. I don't like that. - I discovered this project: https://pagefind.app/ ``` Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure. Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other website framework. The installation process is always the same: Pagefind only requires a folder containing the built static files of your website, so in most cases no configuration is needed to get started. After indexing, Pagefind adds a static search bundle to your built files, which exposes a JavaScript search API that can be used anywhere on your site. Pagefind also provides a prebuilt UI that can be used with no configuration. (You can see the prebuilt UI at the top of this page.) The goal of Pagefind is that websites with tens of thousands of pages should be searchable by someone in their browser, while consuming as little bandwidth as possible. Pagefind’s search index is split into chunks, so that searching in the browser only ever needs to load a small subset of the search index. Pagefind can run a full-text search on a 10,000 page site with a total network payload under 300kB, including the Pagefind library itself. For most sites, this will be closer to 100kB. ``` - One problem I had was that New Advent used `.htm` pages and not `.html` pages. - So when I tried indexing the directory, it wouldn't work because `pagefind` only recognized `.html`. - I had to use several scripts to change stuff around so it works. - I had to also figure out why I was getting `Permission Denied` problems while using `pagefind`. ## Permission Problems - The problem happened when I was in `/mnt/drive/scholastia/` and trying to command pagefind to index. - When I would run: `./pagefind --site "newadvent"` I would get a `Permission Denied` error. - I couldn't figure out what the problem was. - I ended up copying the binary to `/etc/pagefind/` and then I did a test run -- it worked. - Then I realized the 18,000 pages weren't indexing. - Then I realized I had to change all of the `.htm` extensions to `.html` ## HTM to HTML - These words needed to be changed in the `href` section, and also the HTML tags. - I realized I had to use `sed` and some other shit. Here are the scripts I used. **Script 1** - The script below finds `.htm` EXACTLY and replaces it with `.html`. ``` #! /bin/bash find /etc/workspace/newadvent/ -type f -exec sed -i -e 's/\bhtm\b/html/g' {} \; ``` **Script 2** - The script below finds all file extensions named `.htm` and changes them to `.html`. - This has to be done in the directory of choice where all of the `*.htm` files are named. - You have to copy this script in each directory. There are more elegant ways of doing this but I dont care. ``` #! /bin/bash for file in *.htm do mv "$file" "${file%.htm}.html" done ``` **Script 3** - I realized some stuff would turn from `.html` to `.htmll`, so I had to fix that with this script. ``` #! /bin/bash find /etc/workspace/newadvent/ -type f -exec sed -i -e 's/\bhtmll\b/html/g' {} \; ``` **Script 4** - The script below changes a weird occurance: `htm` turned into `htmll4`. - So I needed to change `htmll4` into `html` ``` #! /bin/bash find /etc/workspace/newadvent/ -type f -exec sed -i -e 's/\bhtmll4\b/html/g' {} \; ``` ## Pagefind Finally worked - Everything finally worked out, and I got my search bar. It's Java Script which I don't like, but oh well. - Solr was a pain to deal with. **Search Bar** ``` ``` **Output of Indexing** ``` $ ./pagefind --site "/path/to/directory" Running Pagefind v1.1.0 Running from: "/etc/pagefind" Source: "/path/to/directory" Output: "/path/to/directory" [Walking source directory] Found 19045 files matching **/*.{html} [Parsing files] Did not find a data-pagefind-body element on the site. ↳ Indexing all elements on the site. 2 pages found without an element. Pages without an outer element will not be processed by default. If adding this element is not possible, use the root selector config to target a different root element. [Reading languages] Discovered 1 language: unknown [Building search indexes] Total: Indexed 1 language Indexed 18974 pages Indexed 438285 words Indexed 0 filters Indexed 0 sorts Finished in 231.680 seconds ```