PHATPHUM.ME
Academic2025

Miley Chatbot

CS341 Big Data: Product Recommendation Bot

Miley Chatbot

Role

AI/ML Engineer

Year

2025

Team

Group Project

Tech Stack

LLM, QWEN, Python, NLU, Hugging Face, GPU Cloud (Runpod.io)

A group project for CS341 (Big Data) focused on building a product recommendation chatbot powered by a Large Language Model (LLM) and an end-to-end data workflow. The system is designed using fully open-source technologies, covering the full pipeline—from understanding user messages and retrieving relevant products via vector search to composing a natural-language response grounded in the retrieved context.

01 The Problem

  • Customers struggle to discover relevant products when datasets are large and diverse, making traditional search insufficient.
  • Rule-based chatbots have limited ability to interpret natural language and user intent in context.
  • A practical solution requires connecting language understanding to retrieval and generating responses that remain consistent with retrieved product information.

02 The Solution

  • Adopted Qwen (LLM) with Hugging Face tooling and leveraged GPU Cloud (Runpod.io) to experiment, tune, and evaluate models efficiently.
  • Designed the Qwen-based LLM module as two components: (1) an Entity/Intent Extractor that parses user messages into structured signals before vector search, and (2) a Composer that processes retrieved product lists with conversational context to produce a well-formed final answer.
  • Improved model effectiveness through tuning and prompt/instruction refinement to better fit recommendation scenarios and to reduce inconsistent outputs.
  • Built a structured data pipeline to manage data preparation, query flow across modules, and the handoff between the LLM and vector search in a traceable, maintainable manner.

03 The Result

  • The system extracts entity and intent signals before retrieval, improving the structure and relevance of product candidates returned by vector search.
  • The Composer generates natural, context-aware responses grounded in retrieved product information, improving overall conversational quality.
  • Delivered practical end-to-end experience with an open-source LLM stack: data pipeline design, instruction/prompting, evaluation, and iterative tuning for better performance.

Project Gallery

Gallery image 1
Gallery image 2
Gallery image 3
Gallery image 4
Gallery image 5
Gallery image 6
Gallery image 7
Gallery image 8
Gallery image 9
Gallery image 10
Gallery image 11