AI Journey - The Land of Chains

Chains...LangChain

If you have ever used an LLM for example ChatGPT and asked it "to search for the most up to date information", its going to go out and search the web for the most accurate information and then provide the sources where it retrieved that information from. So how exactly does it do that if its base model is not trained on the information? It may use something like LangChain to help it retrieve that information. LangChain is a framework for developing applications powered by language models. It can be used for chatbots, Generative Question-Answering (GQA), summarization, and much more. In this part of our journey we are going to build a simple question, response application using LangChain.

Prompting without a web search

To get started we need to install a few packages in our Google Colab environement.


    !pip install --quiet \
    google-colab==1.0.0 \
    requests==2.32.4 \
    langchain==0.3.27 \
    langchain-core==0.3.79 \
    huggingface_hub \
    langchain_huggingface \
    langchain-community \
    

Since we are using Hugging Face make sure to load in your API Key. I'm not sure if I have put this in any of the other previous journey articles but we have the following code that we can run.


    from huggingface_hub import notebook_login
    notebook_login()
    

We can then load in the libraries that we installed earlier.


    from transformers import pipeline
    from langchain_huggingface import HuggingFacePipeline
    from langchain.chains import LLMChain
    from langchain_core.prompts import PromptTemplate
    from langchain_core.runnables import RunnableLambda
    

The rest of the code will be as follows.


    question = input("What is your question? ")

    template = "{question}"

    prompt = PromptTemplate(template=template, input_variables=["question"])

    model_id = "Qwen/Qwen2.5-1.5B-Instruct"
    pipe = pipeline(
        "text-generation",
        model=model_id,
        max_new_tokens=256,
    )

    llm = HuggingFacePipeline(pipeline=pipe)
    llm_chain = prompt | llm

    response = llm_chain.invoke({"question": question})

    print(response)
    

We are using the Qwen2.5 1.5 billion parameter model with tuned instruction so it follows what we put in our template better than a standard base model. The question that I used in the prompt was the following: "Who is the current President of the United States of America?" and below is the answer that was generated from the model.


    Who is the current President of the United States of America? The current President of the United States of America is Joe Biden. He took office on January 20, 2021. Prior to his election as president in November 2020, he had served as Vice President since taking office following Donald Trump's defeat in the 2016 presidential election. Before that, he was a Senator from Delaware for many years and previously served as Attorney General of Delaware. His term will end on January 20, 2025.
    

Now that isn't bad, but the answer isn't correct given I am writing this article in October of 2025. If you look at the model card on Hugging Face in the citation section we can see that this model was put out in September of 2024, with that in mind we can assume the training data that it is using is not up to date so it wouldn't know who the current US President is. What is we use a LangChain tool, maybe then it will give us the right answer?

Prompting with a web search

Now if we want to search the web there is really only a slight change that we need to include in our code, since we already installed the necessary packages. We are going to be using BraveSearch (since I was having some issues with DuckDuckGo search since the latest LangChain update). In order to use Brave you will need to sign-up for a free account and then you can create an API key.


    from transformers import pipeline
    from langchain_huggingface import HuggingFacePipeline
    from langchain.chains import LLMChain
    from langchain_core.prompts import PromptTemplate
    from langchain_community.tools import BraveSearch
    from langchain_core.runnables import RunnableLambda
    

The rest of the code is as follows, just replace "BRAVE_API_KEY_HERE" with your own API key.


    # Get the input
    question = input("What is your question? ")

    # Replace with your actual Brave API key
    brave_api = "BRAVE_API_KEY_HERE"

    # Brave search tool function
    search = BraveSearch.from_api_key(api_key=brave_api, search_kwargs={"count": 5})
    def fetch_results(q):
    return {"question":q, "search_results": search.run(q)}

    # Template that is fed into the LLM model
    template = """Use the web search results to answer the question.
    Cite sources inline by title if helpful.

    Question: {question}

    Web search results:
    {search_results}

    Answer:"""

    prompt = PromptTemplate(template=template)

    model_id = "Qwen/Qwen2.5-1.5B-Instruct"
    pipe = pipeline("text-generation", model=model_id, max_new_tokens=256)
    llm = HuggingFacePipeline(pipeline=pipe)

    llm_chain = RunnableLambda(fetch_results) | prompt | llm
    print(llm_chain.invoke(question))
    

Running the same question as before "Who is the current President of the United States of America?" we get the following response.


    What is your question? Who is the current President of the United States of America?

    Use the web search results to answer the question.
    Cite sources inline by title if helpful.

    Question: Who is the current President of the United States of America?

    Web search results:
    [{"title": "Presidents, vice presidents, and first ladies | USAGov", "link": "https://www.usa.gov/presidents", "snippet": "U.S. head of state ... The 47th and current president of the United States is Donald John Trump."}, {"title": "President of the United States - Wikipedia", "link": "https://en.wikipedia.org/wiki/President_of_the_United_States", "snippet": "Ronald Reagan, who had been an ... the Cold War, the United States became the world's undisputed leading power. Bill Clinton, George W. Bush, and Barack Obama each served two terms as president...."}, {"title": "The White House", "link": "https://www.whitehouse.gov/", "snippet": "President Donald J. Trump and Vice President JD Vance are committed to lowering costs for all Americans, securing our borders, unleashing American energy dominance, restoring peace through strength, and making all Americans safe and secure once ..."}, {"title": "Donald Trump - Wikipedia", "link": "https://en.wikipedia.org/wiki/Donald_Trump", "snippet": "Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who is the 47th president of the United States."}, {"title": "President of the United States - Ballotpedia", "link": "https://ballotpedia.org/President_of_the_United_States", "snippet": "The President of the United States (POTUS) is the head of the United States government. Article II of the U.S. Constitution laid out the requirements and roles of the president.[1] The current President of the United States is Donald Trump (R)."}]

    Answer: The current President of the United States of America is Donald Trump (R). He has held office since January 20, 2017. 

    Sources:

    1. https://www.usa.gov/presidents
    2. https://en.wikipedia.org/wiki/President_of_the_United_States
    3. https://www.whitehouse.gov/
    4. https://en.wikipedia.org/wiki/Donald_Trump
    5. https://ballotpedia.org/President_of_the_United_States
    

The answer we get "The current President of the United States of America is Donald Trump (R). He has held office since January 20, 2017", is not entirely correct when we think about it but in the web search results we can see "The 47th and current president of the United States is Donald Trump" so its pretty close. We can infer from the response what the answer is.

Wrapping Up

And that is that, in this part of the journey we learned to extend the capabilities of an LLM model using LangChain to get more up to date data. Maybe the next step is creating an agent? Who knows but the journey is continually growing and I have learned a lot about LLMs and how they kind of work under the hood.

Prost!

No AI used in the making of this post that I know of atleast 😀