Friday, April 28, 2023

28: ChatGPT

Watch an A.I. Learn to Write by Reading Nothing but Shakespeare They are trained by going through mountains of internet text, repeatedly guessing the next few letters and then grading themselves against the real thing. ........ The largest language models are trained on over a terabyte of internet text, containing hundreds of billions of words. Their training costs millions of dollars and involves calculations that take weeks or even months on hundreds of specialized computers. ........ They learn statistical patterns that piece words together into sentences and paragraphs. ........ our model has learned which letters are most frequently used in the text. You’ll see a lot of the letter “e” because that is the most common letter in English. .......... It usually doesn’t copy and paste sentences verbatim; instead, BabyGPT stitches them together, letter by letter, based on statistical patterns that it has learned from the data. ......... They can learn the form of a sonnet or a limerick, or how to code in various programming languages.

Generative because it generates words.

Pre-trained because it’s trained on a bunch of text. This step is called pre-training because many language models (like the one behind ChatGPT) go through important additional stages of training known as fine-tuning to make them less toxic and easier to interact with.

Transformers are a relatively recent breakthrough in how neural networks are wired. They were introduced in a 2017 paper by Google researchers, and are used in many of the latest A.I. advancements, from text generation to image creation.

Transformers improved upon the previous generation of neural networks — known as recurrent neural networks — by including steps that process the words of a sentence in parallel, rather than one at a time. This made them much faster.

GPT-3 was trained on up to a million times as many words as the models in this article. Scaling up to that size is a huge technical undertaking, but the underlying principles remain the same.......... As language models grow in size, they are known to develop surprising new abilities, such as the ability to answer questions, summarize text, explain jokes, continue a pattern and correct bugs in computer code. ........... Some researchers have termed these “emergent abilities” because they arise unexpectedly at a certain size and are not programmed in by hand. The A.I. researcher Sam Bowman has likened training a large language model to “buying a mystery box,” because it is difficult to predict what skills it will gain during its training, and when these skills will emerge. ............... They are also prone to inventing facts and reasoning incorrectly. Researchers do not yet understand how these models generate language, and they struggle to steer their behavior.

Peering Into the Future of Novels, With Trained Machines Ready Who wrote it, the novelist or the technology? How about both? Stephen Marche experiments with teaching artificial intelligence to write with him, not for him. ........ The journalist and author Stephen Marche wrote “Death of an Author” using three artificial intelligence programs. Or three artificial intelligence programs wrote it with extensive plotting and prompting from Stephen Marche. It depends on how you look at it. .......... “I am the creator of this work, 100 percent,” Marche said, “but, on the other hand, I didn’t create the words.” .......... He asked if Marche was interested in using the technology to produce a murder mystery. The result of that collaboration is “Death of Author,” in which an author who uses A.I. extensively winds up dead. ......... To coax the story from his laptop, Marche used three programs, starting with ChatGPT. He ran an outline of the plot through the software, along with numerous prompts and notes. While A.I. was good at many things, especially dialogue, he said, its plots were terrible. .......... Next, he used Sudowrite, asking the program to make a sentence longer or shorter, to adopt a more conversational tone or to make the writing sound like Ernest Hemingway’s. Then he used Cohere to create what he called the best lines in the book. If he wanted to describe the smell of coffee, he trained the program with examples and then asked it to generate similes until he found one he liked. ......... “To me, the process was a bit akin to hip-hop,” he said. “If you’re making hip-hop, you don’t necessarily know how to drum, but you definitely need to know how beats work, how hooks work, and you need to be able to put them together in a meaningful way.” .

No comments: