Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog
Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work.
Stay updated with cutting-edge advancements, practical applications, and real-world use.
Wednesday, 18 March 2026
Creating a Qwen-Powered Lightweight Personal Assistant
Creating a Lightweight Personal Assistant Powered by a Qwen Language Model Image by Editor | Midjourney
Introduction
The Qwen family of language models provides powerful and open-source large language models for various natural language processing tasks.
This article shows you how to set up and run a personal assistant application in Python powered by a Qwen model — specifically the Qwen1.5-7B-Chat model, which is an efficient and relatively lightweight 7-billion-parameter chat model optimized for conversational use cases. The code shown is ready to be used in a Python notebook such as Google Colab, but can easily be adapted to run locally if preferred.
Coding Solution
Since building a Qwen-powered assistant requires several dependencies and libraries being installed, we start by installing them and verifying installation versions to ensure compatibility among versions you might have pre-installed as much as possible.
We use Qwen/Qwen1.5-7B-Chat, which allows for faster first-time inference compared to heavier models like Qwen2.5-Omni, which is a real powerhouse but not as lightweight as other versions of this family of models.
As usual, when loading a pre-trained language model, we need a tokenizer that converts text inputs to a readable format by the model. Luckily, the AutoTokenizer from HuggingFace's Transformers library smoothens this process.
To enhance efficiency, we try to configure 4-bit quantization which optimizes memory usage.
# Load Qwen1.5-7B-Chat model - publicly available and efficient to run in Google Colab with T4 GPU
# Trying to load the model with 4-bit quantization for efficiency
try:
print("Attempting to load model with 4-bit quantization...")
model=AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,# Use bfloat16 for better performance
device_map="auto",
trust_remote_code=True,
quantization_config={"load_in_4bit":True}# 4-bit quantization for memory efficiency
)
except Exception ase:
print(f"4-bit quantization failed with error: {str(e)}")
print("Falling back to 8-bit quantization...")
try:
model=AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
load_in_8bit=True# Try 8-bit quantization instead
)
except Exception ase2:
print(f"8-bit quantization failed with error: {str(e2)}")
print("Falling back to standard loading (will use more memory)...")
model=AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
load_time=time.time()-start_time
print(f"Model loaded in {load_time:.2f} seconds")
When building our own conversational assistant, it is typically a good practice to craft a default prompt that accompanies each specific request to adapt the model's behavior and generated response to our needs. Here's a specific default prompt:
system_prompt = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should be engaging and fun. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
The following function we will define encapsulates the heaviest part of the execution flow, as this is where the model gets user input and is called to perform inference and generate a response. Importantly, we will run a conversation in which we can sequentially make multiple requests, therefore, it is important to manage the chat history accordingly and incorporate it as part of each new request.
Once the key function to generate responses has been defined, we can build a simple user interface to run and interact with the assistant.
The interface will contain an output display area that shows the conversation, an input text box where the user can ask questions, and two buttons for sending a request and clearing the chat. Notice the use of the widgets library for these elements.
traceback.print_exc()# Print the full stack trace for debugging
returnFalse
# Overarching function for our application: we can choose here which interface to use
def run_assistant():
print("\nRunning quick test...")
test_success=quick_test()
iftest_success:
# Ask user which interface they prefer
interface_choice=input("\nChoose interface (1 for UI, 2 for CLI): ")
ifinterface_choice=="2":
cli_chat()
else:
print("\nStarting the personal assistant UI...")
assistant_ui=create_assistant_ui()
display(assistant_ui)
# Usage instructions
print("\n--- Instructions ---")
print("1. Type your question in the text box")
print("2. Press Enter or click 'Send'")
print("3. Wait for the assistant's response")
print("4. Click 'Clear Chat' to start a new conversation")
print("----------------------")
else:
print("\nSkipping UI launch due to test failure.")
print("You may want to try the CLI interface by calling cli_chat() directly")
# Running the conversational assistant
run_assistant()
Trying It Out
If everything has gone well, now it's time to have fun and interact with our newly built assistant. Here is an example excerpt of the conversational workflow.
Running quick test... Test Question: What can you help me with? Response: 1. General knowledge: I can provide information on a wide range of topics, from history and science to pop culture, current events, and more. 2. Problem-solving: Need help with a math problem, figuring out how to do something, or troubleshooting an issue? I'm here to guide you. 3. Research: If you have a specific topic or question in mind, I can help you find reliable sources and summarize the information for you. 4. Language assistance: Need help with writing, grammar, spelling, or translation? I can assist with that. 5. Fun facts and trivia: Want to impress your friends with interesting facts or just looking for a good laugh? I've got you covered! 6. Time management and organization: Strategies to help you stay on top of your tasks and projects. 7. Personal development: Tips for learning new skills, setting goals, or managing your emotions.
Just let me know what you need, and I'll do my best to assist you! Remember, I can't always give away all the answers, but I'll certainly try to make the process as enjoyable and informative as possible. Generation time: 18.04 seconds
Choose interface (1 for UI, 2 for CLI):
Below is an example of live interaction through the UI.
Qwen-based conversational assistant's UI Image by Author
Conclusion
In this article, we demonstrated how to build a simple conversational assistant application powered by a lightweight yet powerful Qwen language model. This application is designed to be run and tried out efficiently in a GPU setting like those offered by Google Colab notebook environments.
No comments:
Post a Comment