Analyzing Patterns in Student SQL Solutions

Structured Query Language (SQL), the standard language for relational database management systems, is an essential skill for software developers, data scientists, and professionals who need to interact with databases. SQL is highly structured and presents diverse ways for learners to acquire this skill. However, despite the significance of SQL to other related fields, little research has been done to understand how students learn SQL as they work on homework assignments. The aim of this project is to analyze students' SQL submissions to homework problems in a Database course.

Via Levenshtein Edit Distance

The first stage of this project focused on computing the Levenshtein Edit Distances between every submission and their final submission to understand how students reached their final solution and how they overcame any obstacles in their learning process. We developed a system that visualizes the edit distances between students' submissions to a SQL problem, enabling instructors to identify interesting learning patterns and approaches. These findings will help instructors target their instruction in difficult SQL areas for the future and help students learn SQL more effectively.

Read More
Via Sequence Alignment Algorithms

In the second part of the project, we are using local and global sequence alignment algorithms to identify patterns of similar approaches students used to solve a given SQL assignment. We started with producing a heatmap that shows the differences/similarities between all the submissions students made for a given problem. We are currently analyzing heatmaps of different students submissions toward the same SQL problem to identify and categorize similar approaches. 


System Overview


X and Y axis represent the submission number of the student. The darker the color the more similar the submissions are.

Learning Next-Generation Databases

UI (1)
relationship (1)

TriQL’s Query Builder and Query Result Interfaces. The QBI allows users to construct the queryusing a user-friendly GUI and the Query Result Inter-face shows the query result in its native database


Breakdown of errors by SQL concept evaluated


Cypher Error submissions per Concept


Breakdown of most common Javascript and MongoDB errors by question

TriQL-Arch (1)

TriQL System Architecture: The Query Builder for building queries using a GUI; the IntermediateQuery Generator converts user queries to DataLog; The Schema and Query Translator generates the schema of the three database and coverts the DataLog query into SQL, Cypher and MongoDB

With more organizations relying on data to make crucial business decisions, database systems have become essential in managing financial, medical, and scientific data. Consequently, managing databases has become a necessary skill for programmers, data analysts, and data scientists to accelerate scientific inquiry and business decision-making. However, with the abundance of databases supporting various data models, such as relational, graph, document-oriented, beginner learners often find it challenging to decide what database model they should learn. Experienced developers also struggle to learn new database models as different models have different data structures and query languages. This project aims at developing student and instructor tools that can facilitate the learning and teaching of next-generation database systems.

TriQL: A tool for learning relational, graph and document-oriented database programming

This project introduces TriQL, a system for helping novices learn the structures (schema) and query languages of three major database  systems, including MySQL (a relational, SQL-Structured Query Language, database), Neo4J (a graph database), and MongoDB (a document/collection-oriented database). TriQL offers learners a graphical user interface to design and execute a query against a generic database schema without requiring them to have any

database programming experience. TriQL follows an interactive approach to learning new database models, supporting a dynamic and agile learning environment that can be easily integrated into database labs and homework assignments.

Read More
A Quantitative Analysis of Student Solutions to SQL, Graph and Document Database Assignments

In this project, we analyze students’ errors in homework submissions of queries written in SQL, Cypher (the query language

for Neo4j—the most prominent graph database), and MongoDB (a document-oriented database). Based on tens of thousands of

student submissions from homework assignments in the database course I teach here at the University of Illinois, we then provides a quantitative analysis of students’ learning when solving database problems and we suggest a further improvement on the classification of syntactic errors.

Modeling The Content Structure of MOOCs

The number of Massive Open Online Courses (MOOCs) are increasing rapidly, providing students with tremendous opportunities to improve their knowledge and career. However, most MOOC platforms consider the course as the smallest unit of content delivery, wasting a valuable opportunity for learners to develop customized content tailored to their interests. This project aims to develop the infrastructure needed for offering customized MOOC courses.  Our approach is to mine the model the content of existing MOOC courses so that we can build knowledge structures that can facilitate customized learning. 

Unsupervised Approach for Modeling Content Structures of MOOCs

In the first part of the project, we introduced an unsupervised approach to build the precedence graph of similar MOOCs, where nodes are clusters of lectures with similar content, and edges depict alternative precedence relationships. Our approach to cluster similar lectures based on PCKMeans clustering algorithm that incorporates pairwise constraints: Must-Link and Cannot-Link with the standard K-Means algorithm. To build the precedence graph, we link the clusters according to the precedence relations mined from current MOOCs. Experiments over real-world MOOC data show that PCK-Means with our proposed pairwise constraints outperform the K-Means algorithm in both Adjusted Mutual Information (AMI) and Fowlkes-Mallows scores (FMI).

Read More
Topics Transitions in MOOCs

Modeling the relationships among educational topics is a fundamental first step for automating curriculum planning and course design. In the second part of the project, we introduce Topic Transition Map (TTM), a general structure that models the content of MOOCs at the topic level. TTMs capture the various ways instructors organize topics in their courses by modeling the transitions between topics. We investigate and analyze four different methods that can be exploited to learn the Topic Transition Map: 1) Pairwise Constrained K-Means, 2) Mixture of Unigram Language Model, 3) Hidden Markov Mixture Model, and 4) Structural Topic Model. To evaluated the effectiveness of these methods, we qualitatively compare the topic transition maps generated by each model and investigate how the Topic Transition Map can be used in three sequencing tasks: 1) determining the correct sequence, 2) predicting the next lecture, and 3) predicting the sequence of lectures. Our evaluation revealed that PCK-Means has the highest performance in the first task, HMMULM outperforms other methods in task 2, while there is no winning in task 3

Read More

Integrating CustomLearn service into MOOC platforms to accommodate goal-oriented learning and other learners needs

Clustering Student-Written SQL Queries


Echelon: An AI Tool for Clustering SQL Queries

As part of teaching SQL, instructors often rely on auto-grading systems for marking students' assignments. However, such systems lack essential insights into the approaches students use to solve these assignments, allowing subtle flaws in student intuition to go unseen. Further, manual analysis of students' code submissions ranges from costly to impossible, depending on the assessments' frequency. The goal of this project is to use AI to help database instructors quickly identify trends in students' solutions. To that end, we developed a system called Echelon capable of extracting features that instructors deem significant from students' SQL queries and using them to generate clusters that capture the key approaches taken. The system creates a two-dimensional projection, which is then linked to a dashboard that instructors can use to rapidly assess class performance, using clustering algorithms to group student approaches into clean, intuitive categories. Instructors can then address a variety of student approaches, and thus create a more responsive classroom. 

Read More

FIE 2023 Presentations:
- DB Learning

Current Students:

Graduate Students:
  • Sophia Yang, Ph.D.
  • Ridha Alkhabaz, MCS.
  • Mukesh Naresh Chugani, MCS
Undergraduate Students:
  • Nikhil Khandekar
  • Rishabh Jain
  • Tom Reichel
  • Rohit Narayanan

Past Students:

Graduate Students:
  • Ping (Roger) Che
  • Shivani Kamtikar
  • Risha Aich
  • Matthew C Weston, MCS
Past Undergraduate Students:
  • Ridha Alkhabaz
  • Mei Chen
  • Angie Cheng
  • Danjin Jiang
  • Eric Wang
  • Hao Yuan
  • Haorong Sun
  • Haoyu Zhang
  • Karan Abrol
  • Ke Deng
  • Kuihua Liu
  • Lawson Probasco
  • Leyao Zhou
  • Lujia Kang
  • Manasi Abhyankar
  • Micah Jeng
  • Osama Esmail
  • Peilin Rao
  • Qishan Zhu
  • Reetahan Mukhopadhyay
  • Rishabh Jain
  • Russell Seligmann
  • Siqi Xiong
  • Siyun Iu
  • Stephen Fan
  • Tailin Zhang
  • Xiaoying Zhu
  • Yitao Meng
  • Yu Chen
  • Yuxin Wang
  • Zawaadul Karim
  • Zhilin Zhang
  • Zhining Qiu
  • Ziyuan wei