current state of llms


(disclaimer: no, i will not write often about llms, but i think its important to state my thoughts somewhere)

intro


okay, so llms are super weird to talk about because there is a huge gulf between their hype and reality. so many groups are saying that llms are ultra smart sentient super-computers that can do anything and everything and will replace everybody.
however, the people that hype up llms in this way always fit into one of two groups: a) they're a person who stands to gain from increased llm adoption/hype or b) they themselves have no clue any tech specifics about llms, and are just going off of the hype.
so in this post i wanted to do two things:
  1. clarify the "real" state of llms in terms of capabilities and real-world uses
  2. clarify my own personal viewpoints on llms
  3. clarify my own personal viewpoints on llm-creating companies themselves


what is an llm


we need to start with what is an llm, because knowing this will help frame the rest of the discussion.
an llm is literally just very very fancy auto-complete. it takes in a list of tokens, and using some fancy math + settings + training data, tries to predict the next token in the sequence.
so, lets look at some pseudocode example

text = "the house is red"
tokens = tokenize(text)

tokens is now [54, 256, 12, 28]


result = [] while (next_token := get_next_token(tokens)) != STOP_TOKEN: result += next_token tokens += result

result is now [278, 279, 300, 301, 302, 304]

result_text = decode(result)

result_text is now "Which house are you referring to?"


obviously, this code doesn't work, its pseudocode lol. but it gets my point across:

  1. our input text is transformed into a list of tokens.
  2. the llm is doing fancy math to figure out the next token.
  3. these tokens are generated till a special stop token is emitted.
  4. the result is decoded to regular text.

how is the next token decided? fancy math, lots of it.

how capable are llms?


capable


well, it depends on the model and the task.
if you need to generate human-language text, they're pretty decent if the text is a) fact-based and b) in their training data.
if you need to generate small code snippets, like a bash script, they're pretty decent.
if you need to do basic text task automation (i.e make these list of filenames shorter), they're decent.
basically, any task that you can give a middle–schooler and expect a decent result out of, will be suitable for an llm.

not capable


llms start performing really bad when the task is actually difficult or requires logic for it.
so for example, working on a large codebase, doing complicated math, writing opinion pieces on new text/ events, things like that.
frankly, i have no clue how some people claim llms are good programmers, as i have yet to get a good result from them.

enhancing llm capabilities


there are currently two (main) ways that people can enhance an llm (aside from training a new model).

rag - retrieval augmentation generation


rag is a very fancy way of describing these steps:
  1. retrieve the answer to a user's input from a pre-made database
  2. give this answer to an llm + tell it to generate an answer based on the query and the answer from the database

finetuning


finetuning is similar to training, in which you give the llm a list of queries + accepted answers and change the model to get these answers.

does it change how capable an llm is?


kind of? rag is moreso about solving the problem of keeping an llm's context window small, and also giving an llm up-to-date info. finetuning can help specialize an llm. but these don't change the fundamental problem: llms just arent good at logical problem solving. they just cant "think."

my views on llms


usage for learning


just dont. llms are just way too unreliable to be a good source of truth. i see too many people put wayyyyy too much trust in llms, and they think that llms are this super awesome tool, and they just dont learn. they end up spinning their wheels like its nobody business.

usage for doing (?)


i think it depends. can they be useful for automating the boring stuff (reference unintended)? probably, yeah. like i mentioned, basic text generation/transformation where the stakes are low is a pretty good case for them. and extracting explicit information from a body of text (like dates, author names, etc) is also a pretty good case.
a better/more explicit criteria i can come up with is: if you can verify the output of the llm in a reasonable amount of time, then yes, the llm can help with the task.
however, i think that people should be careful for larger tasks. i've heard of a lot of programmers losing their edge because they use an llm to help them refactor or code. sometimes the easy tasks are the exact tasks that "keep us in shape". plus, if you use an llm for a larger task, you don't really net any time saved. look at my super-duper algorithm:


time_saved = time_you_would_take_to_do_task - time_llm_takes_to_do_task

if (task_important) { time_saved += time_to_verify // here, time_saved == time_you_would_take_to_do_task }


a lot of people also try and skip the learning and use llms for doing important tasks, and frankly that's just dumb. if the task is important, and you can't verify the llm's output, then you're just opening yourself up to crazy risk.

other usages


i think llms as chatbots/a more user-friendly ui is a neat idea. (companies need to learn that customers do need eventually need the opportunity to speak to a human to get problems resolved though.) im actually excited to when a small llm can be embedded into apps without using up too many resources.
generating text/image embeddings? holy shit, this is the real greatest invention. frankly i think this is far more useful than any of the "traditional"/non-technical usages of llms. text/image similarity search is awesome and i think every dev should be make at least one project working with this.

my view on llm companies


this is just madness. hank green recently made a good video[1] outlining exactly how the llm-tech-space is just a huge bubble. i frankly just dont understand how this has been going on for this long. greed sickens me. and im tired of every gd application or company creating some llm powered bs whatever the fuck. screw that shit.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa

tldr


use llms when a) you can verify the output b) the output doesnt need a lot of verification.
llm companies suck.

links :)