Boolean satisfiability is a classic problem in computer science. Given a series of n boolean variables, A B C ... and a formula in 3-conjunctive normal form
CDCL is a complete and sound method, so the canonical solver line is also the number of solvable instances.
"Abstract Nonsense" is a somewhat loving, but somewhat derisive term for methods (typically
Category Theoretic methods) in pure mathematics that are unreasonably convoluted and involve a lot of theoretical machinery.
I myself am awful at Category theory but excellent at abstract nonsense, and I wanted a space to share my thoughts
and projects. I'm well aware that very few people will read this blog, but to me this space is a journal. A respite
from the giants that control the web, and a space to share my thoughts into the void, in a way I can control and moderate.
More concretely, I hope to maintain "Abstract Nonsense" as a dev log as sorts. Not because I think it showcases phenomenal
technical talent, but because it showcases some of the cool things I've been learning on the side.
I'll keep my first entry on this journal quite short. This entry stands well on its own.
Because it does something the category theorist in all of our hearts would love.
It's self referential.
The content engine that runs Abstract nonsense is quite brilliant if I do say so myself.
It is a python script tha takes in a series of html files, and agglomerates them into a single file.
In addition to the abstract nonsense engine I have two other python scripts that form the backbone of this (static)
website. I have a script that takes in plaintext of a quote document I have been personally maintaining for the
past 3 years. It uses regular expressions to parse out the quotes and build an html file that contains java-script
that builds a dynamic webpage this java script program alters the html on the page to create a typing effect.
Check it out
here! The final piece of this beautiful infrastructure is a third script that runs both scripts than commits the whole branch to master.
As I learned on Twitter/Reddit/The Quote Document:
"Everybody has a testing environment. Some people are lucky enough enough to have a totally separate environment to run production in." - @stahnma
Abstract nonsense and this website as a whole is both test and prod. Maybe one day, I'll be a good enough engineer
to be able to invest in a test and prod for my website.
If you have a fixed budget or you want to tweak the numbers to see what would need to change to meet certain financial goals try out the optimizer. The optimizer uses the bisect method to find some input which meets a certain goal. For example, say you have 100,000$ and you want to figure out how much you can spend on a house, the optimizer will help you budget.
Right now this product is in a tech-demo stage. Short-term, there are two features that we plan to build out relatively shortly.
This is a weird blog entry. The end goal of this project is to run a 100 mile Backyard Ultramarathon using only free software This rule is to be interpreted as reasonably as possible and should only apply to tech worn or carried through the race. This rule also does not apply to any crew. This rule is to be followed in spirit. For example if unavoidable small bits of nonfree cpu microcode are acceptable or in modem firmware, care will be taken to isolate such components. An artifact of this is any music listened to during the race will be DRM-Free. The patent on mp3s has recently expired so it is free. I've been a pretty rubbish runner for most of my childhood so this project is technically and physically grueling.
Distance | P95 | P99 | PR |
---|---|---|---|
Mile | 6:00 | 5:20 | 5:47* |
1.5 mi | 9:10 | 8:22 | 8:58* |
5k | 21:12 | 17:38 | 19:47* |
10k | 41:17 | 34:24 | 44:42 |
Half Marathon | 1:33:04 | 1:18:07 | 1:38:13 |
Marathon | 3:08:42 | 2:44:18 | 3:58:05 |
50k | ??? | ??? | 4:56:35 |
24 hr Backyard ultra ruleset run distance | 50 mi | 100 mi | 34.4 mi |
This "blog" is called "Abstract Nonsense" because of this project. Most language models try to build interesting
output, but end up spouting abstract nonsense (with or without some semantic correctness). Well,
I thought to myself, I have a corpus that itself is really just abstract nonsense, maybe I could train an NLP
transformer model on this corpus, and oddities of syntax, would actually be a feature!
Because the robot is confused, it will also be named Abstract Nonsense, to maximize perplexity with respect to the
identically named blog hosted on this site
I present to you GPT 9001! Which is really just a fine tuned version of GPT 2 tuned for text generation on
the Quote Doc In this project I learned that
hand-rolled models that I can quickly train are trash. For example, the first implementation of GPT 9001, was called GPT0,
and was just some LSTM model I spun up and trained on the quote doc, the LSTM model could either predict random
words or overfit the training set. It couldn't do anything of interest :(.
Anyway, without further ado here s/he is:
This update is a quick one.
I learned that this floofer needed some head pats, and I had to help!
This is an important cause, so feel free to compile and run the following java script (not javascript fortunately) to help out the floofer.
I came across an exciting problem in a stand up maths video.
Using a simple brute force graph-theory argument, a viewer had gotten the runtime of Matt's solution down from 32 days to 15 minutes. By throwing the book at the problem I thought we could do better.
With good fundamental knowledge of the english language and machine structures other authors
have implemented some variant of exhaustive search with substantially better runtime of 100 milliseconds. With fundamentally simple code.
However, using Mixed integer programming, ( a simple formulation around set covering) we can optimize this to 10 seconds (360 * 24 * 32) times faster than Matt's code.
An alternate approach using pySMT (satisfiability modulo theories) has an estimated runtime of around 11 hours. My formulation of this as a SMT problem finds one of the 11 assignments of words for this problem
1 hour.
These are overkill solutions but have a certain mathematical elegance to them, that makes me really happy.