Text Completions & Summaries with GPT-3

4 min readJan 9, 2023

My favourite genre of ChatGPT interactions is when it is tasked with summarising a body of text or learning based on the data you feed it. It’s pretty meta; the machine learns that you are asking it to, well, learn.

Photo by DeepMind on Unsplash. (Artist rendition of Large Language Models! Super cool.)

I’m sitting on a ton of .txt files that I haven’t quite had time to play with. When OpenAI released ChatGPT and GPT-3.5, I took a look at my text-based data that I’d done some half-baked traditional NLP analysis on, and wondered, “why reinvent the wheel when you can stand on the shoulders of our giant tech corporation overlords?” — instead of spending hours on the tiresome tokenisation-stemming-stopword-etc pipeline, why not I just throw everything to GPT and let it do the work for me?

So I wrote a very short script (30 lines long!) that makes use of the OpenAI API and its Da Vinci GPT-3 model, takes in a query and text data, and returns a response to the query.

The script is horrifyingly simple (so simple that I can paste the entire thing in the Appendix below) but has a surprising number of applications. You could do things like…

Summarise a news feed’s output over the last X minutes in Y words
Answer reading comprehension questions based on a given passage
Generate coherent meeting minutes based on the notes you jotted down
Create marketing/sales copy given information on specific clientele

I tried running it on some text files I had (files omitted for privacy reasons), and here are the results:

LinkedIn Profile Summary

Query: Create a LinkedIn profile summary for a student based on the text given.

Responses:

Remarks: I scraped this data from my own LinkedIn connections in 2021, hence it’s all finance bro talk (sorry if you were in the dataset). These captions are so wonderfully bland, like a Uniqlo basic white tee, like vanilla ice cream, like someone named Tan Jun Jie.

Investment Note for Kids

Query: Based on the following headlines, generate market commentary for an audience of children.

Response:

Remarks: The data is simply the last 3 hours of Bloomberg Markets headlines and GPT was able to piece all of it together and pick out the higher frequency tokens. It looks like the job security of sell-side research analysts is in a state of flux right now!

Essay Prompt Response

Query: Based on the given essay prompt from Ethics class, write the first paragraph of a response to the prompt.

Response:

Remarks: Running the same prompt twice gives pretty different response structures. The first image is more of an essay summary while the second jumps straight into answering the prompt. I think this is because the temperature hyperparameter is nonzero, which makes the model less “deterministic”. As far as I know there’s no random seeding option for GPT-3.

Takeaways

There is a real limitation that stops this from going far beyond personal use— the max token count allowed for request + response is 4000 for the Da Vinci GPT-3 model (see docs). For the above prompts, the script took about 4–5 seconds to run. Could be an API paywall thing, but the query and response still cannot be too long.

For some applications, this would be easy to work around; for example, if you are summarising a research paper for school, you can partition the paper’s text into multiple files and generate summaries for each one, then recombine. If the application requires GPT-3 to “see” all the data at once for better results, it might be harder (unless there is some way to do random sampling + “ensemble” the results.

I like Bayesian thinking as much as the next guy, but probabilistic AI won’t quite take over the world just yet. GPT-3 is trained to identify relationships between words in order to predict the next one, which is why it’s still up to us to come up with good, specific queries and play with the levers that help it learn (see this article for a practical guide) if we want to get the most value out of it.

Appendix

#gpt_generator.py
#by: Jolie

import openai

openai.api_key = #YOUR_API_KEY

def chatgpt_generator(query,data):
    output = openai.Completion.create(
        model = "text-davinci-003",
        prompt = query + "\n" + data,
        temperature = 0.9,
        max_tokens = 200,
        frequency_penalty = 1.0,
        presence_penalty = 1.0
    )
    return output


if __name__ == "__main__":
    text_file = input("Name of data file (must be .txt): ")
    my_query = input("Query for ChatGPT: ")

    with open(text_file,'r') as file:
        my_data = file.read()

    print('Query:')
    print('-'*12)
    print(my_query)

    response = chatgpt_generator(my_query,my_data)
    print('Response:')
    print('-'*12)
    print(response['choices'][0]['text'])