Comment by 😺 k8quinn
Re: "pandoc and website preservation"
Pandoc has a lot of options including specifying CSS to be included in the output. In principle, you could specify the CSS file(s) from the old site via a link. You'd have to look at the source for the page to find out where the CSS is located. Alternatively, you could open the epub in an editor like Sigil-ebook or even Emacs and add original/own CSS as appropriate. Disclaimer: I haven't tried this myself.
2025-11-28 · 5 months ago
4 Later Comments ↓
Another option is to have an AI assistant look at the complicated HTML document and create a text-only, or even gemtext version that matches as much as possible...
😎 decant [OP] · Nov 29 at 01:50:
@Half_Elf_Monk For wikipedia, I use to tricks: 1. zim archive and python library, zimply, I don't know if zimply works with minimal text browser. 2. there is always gempedia, I just save the .gmi file for good articles. But as @stack said there are many irredeemable java heavy sites where I will just download with wget or just ctrl-c ctrl-v the text part.
🌲 Half_Elf_Monk · Dec 12 at 16:16:
@stack - That would work, but I simply don't trust the LLMs to get details right. @decant - interesting. Realistically, shouldn't there be a way to simply download a snapshot of wikipedea as a whole? As mostly-text, it shouldn't be that big. Why are there no local wikipedia browsers?
@Half_Elf_Monk -- there is a stripped down wikipedia distribution, people often put it on local devices feaured on Gizmodo and such... Can't remember where to get it, and not at a decent computer now, but I am sure you can easily find it.
Original Post
pandoc and website preservation — Back when I used firefox/chromium. I use their print function to save full web page to a pdf file. For example, Paul V Bolotoff wrote articles on the history of DEC alpha CPUs, but his website is long gone, the only copy of the article I could find is on the archive section of someone’s personal site. But I found out I could use pandoc accomplish this task: pandoc [http link] -o oldarticle.epub I find the epub family of formats better suit my needs. PDF is...