We are back! At this level of our journey we are going to build ChatGPT...well not exactly ChatGPT but a smaller slimmer version that can barely hold a conversation for more than a few prompts (which I guess is pretty similar to GPT if you really think about it). But none the less we are one a journey and like the pyramids you cant build something that grand over night.
Back in Google Colab we can start with installing the necessary package - we are going to be leveraging the transformers python package, this package allows us to access the Pipeline API what this does in simple terms is allows us to access already pre-trained models abstracting the complexity that we would have if we were to use our own models. We can run the following to install the package.
!pip install --upgrade transformers -qU
After installing the package we can choose which ever model we want, just make sure when you are choosing your model you are selecting one with the Hugging Face tag (Text Generation) otherwise your converstation wont be much of a conversation. Start with making sure your are importing the packages that we need.
from transformers import AutoModelForCasualLM, AutoTokenizer
import torch
Next we can load the model!
model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCasualLM.from_pretrained(model_name)
We are using DialoGPT-medium a small model with only 345 million parameters - for context on Hugging Face there is a model Ling-1T which is a 1 trillion parameter model (we cant run that in Colab sadly :( ). But even using a smaller older model we can still have some fun.
# Store the converstation - we do this so the model can understand the conversation flow
chat_history = None
Run a While loop which you can terminate at any time during the conversation.
print("Chatbot ready! (type 'quit' to exit)\n")
while True:
user_input = input("You: ")
if user_input.lower() in ["quit", "exit", "bye"]:
print("Bot: Goodbye!")
break
# Here we encode the input and append to the history
new_input = tokenizer(user_input + tokenizer.eos_token, return_tensors='pt')
# Append the bots history or start new if chat_history is not equal to None
bot_input = torch.cat([chat_history, new_input], dim=1) if chat_history is not None else new_input
The next thing we need to do in our loop is generate the response from the model we do that using the model.generate() method in the Transformers library. I'll include comments to describe what parameter is doing in the code.
chat_history = model.generate(
bot_input, # This is the initial input to the model - so whatever token IDs we may have
max_length=1000, # This is the maximum length the model with output (this total also includes the input tokens)
pad_token_id=tokenizer.eos_token_id, # This parameter tells the model what token to use for padding sequences to the same length, making it easier to storage and re-training (if needed)
do_sample=True, # We enable sampling, essentially enabling the process for selecting the next word in a generated sequence (Yes I got that from Google) - by default this is disabled
top_p=0.95, # Nucleus Sampling, this strategy limits the pools of possible tokens in our case we only want tokens with a cumulative probablility of greater or equal to 95
temperature=0.8, # This parameter controls the randomness of the model - lower = more logical, higher = more creative.
)
Finally we decode and print the response from the model. And we can see from the output - we are pretty far from creating the next ChatGPT but its a start!
response = tokenizer.decode(chat_history[:, bot_input.shape[-1]:][0], skip_special_tokens=True)
print("Bot:", response)
Chatbot ready! (type 'quit' to exit)
You: Hello!
Bot: Hi!! Thank you!
You: How are you?
Bot: I'm pretty good! How are you?
You: I'm okay, what are you doing right now?
Bot: Just getting ready for some work, you?
You: Same, what do you do for you?
Bot: Im going to a party tonight but I have a lot of things to do.
You: What work?
Bot: Nothing really.
You: Are you sure? You said earlier you are getting ready for work.
Bot: Yes, I am.
You: So what work do you do?
Bot: What do you want to do?
You: Depends, what are you good at?
Bot:
You: So nothing?
Bot:
You: exit
Bot: Goodbye! 👋
So its not the greatest of Chatbots but honestly this exercise was really just a tune-up for some bigger things that I have in my head. There are a couple of projects that I think AI/LLMs can help me with, but before you can run you first have to learn how to walk, pretty sure some famous person or maybe it was IronMan idk but either way this is a journey not a sprint. Until next time!
No AI used in the making of this post that I know of atleast 😀