Zachary Dixon and Kelly George, both professors at Embry-Riddle Aeronautical University, were at a faculty meeting and commiserating about Course Hero, a website students use to upload and share documents related to their courses, when an idea came to them. What if they could come up with an automated way to help professors identify content from their courses that had been shared on the site, and request its removal?
Professors have long been concerned about the risk of student cheating posed by students sharing copyrighted course documents, including tests and quizzes, on Course Hero, which bills itself as "an online learning platform of course-specific study resources."
Ask any faculty member if they know about Course Hero, and they'll say, "I’ve heard a little bit about it, but tell me more,” said George, an associate professor in Embry-Riddle’s College of Aeronautics, which has its main campuses in Arizona and Florida. “You open your laptop and say, ‘What course do you teach?’ You throw that up and you see 900 or 1,200 artifacts” -- documents related to the course -- “and you can cursorily look through them. One person looked through and said, ‘I just wrote that course and there’s one of the major assignments, a final paper, up there.’”
“Dr. George and I started thinking, what can we do to get our hands around this issue,” said Dixon, an assistant professor of humanities and communication. “If there are 100,000 documents matching our university’s content, that’s too much for any one thing. It became clear that we had to find a new way to kind of automate the process, game it, if you will, and that set us off on the journey.”
Dixon and George have worked with computer science students to develop a tool they call CourseVillain, a customized search engine that searches Course Hero for documents related to Embry-Riddle courses and partially autopopulates copyright takedown requests. As of late last week, the search engine had turned up 237,293 artifacts traceable to Embry-Riddle, according to George.
The takedown requests are not automatic, and it’s not a cure-all: individual faculty members have to justify why they own the copyright to a document, such as a test or a quiz, that they want taken down. A professor could not necessarily assert copyright over all documents; for instance, a student essay would not be covered.
“CourseHero.com seems to have specific verbiage that signifies a threshold of ownership over artifacts,” Dixon said. “Reliably figuring out the boundaries of ownership over academic content is going to be a complex process. That process is a key part of the project's future and development.”
In the meantime, George said it provides an opportunity to proactively discourage students from using or posting material to Course Hero.
“Let’s educate the students as well and tell them this is an academic integrity violation,” George said. “We have one faculty member who found a student in his class who shared something and came right up and asked him, ‘Is this your assignment? Did you share this on Course Hero?’ When the student said yes, the faculty member said, ‘You need to take it down; these are the reasons why.’”
The tool also provides what Dixon describes as a “more clear, real-time rendering of how exposed academic content is.”
“Knowing what is circulating in these crowdsourced spaces is very important for identifying course content that needs to be updated, revised or completely redeveloped,” Dixon said. “For example, with reference to an English composition essay, if I know a particular essay was being widely shared, I can change the prompt, coach students away from the compromised topic or subject, pay particular attention to what students are submitting and cross-reference with CourseHero.com if I'm suspicious, or even simply post an announcement to the class that I'm savvy about plagiarism and I've an eye trained on CourseHero.com.”
Dixon presented on the research he and George did using CourseVillain at a conference organized by the International Center for Academic Integrity (ICAI) last month. They used the tool to analyze content on Course Hero from seven Embry-Riddle courses across a variety of disciplines.
They developed what they describe as a “course compromise metric” to determine what proportion of documents being shared for a course on the site were “low-value” versus “high-value” artifacts. A low-value artifact might be a set of a student’s own notes -- something that, if shared, would not necessarily compromise the integrity of the course. A high-value artifact might be a test, quiz or paper. Medium-value artifacts would be things like homework or discussion questions.
“In total nearly half of the certified artifacts we collected posed a clear danger to their course’s academic integrity,” Dixon said during his presentation.
“We’re definitely confident that students are exchanging a significant degree of potentially dangerous coursework online. The conditions are ripe, and it looks like students are taking advantage of those conditions to conduct academic misconduct.”
Dixon said the tool was developed cheaply, for less than $10,000, and built by a computer science student. Dixon’s message to colleges is they too can build it for themselves, and inexpensively. “There is a student in your institution who can build this, I guarantee you,” he said at the recent ICAI conference.
Dixon notes, however, that the tool needs to be continually maintained. Otherwise if Course Hero makes a computing change on its end, it can have major reverberations on the effectiveness of CourseVillain.
Andrew Grauer, Course Hero's CEO, said he is not familiar with the details of how CourseVillain works. "But the general idea of people coming in and exploring and contributing to and getting value from and helping to improve or police or moderate the platform, that’s absolutely in line with what we’re trying to achieve in creating a more accessible learning platform and teaching platform, in making, aggregating, organizing and disseminating learning and teaching resources in a transparent way," he said. "Wherever you are in the world, whatever you’re studying, whatever you’re trying to learn across grade levels, we want to make it as open and transparent as possible."
Course Hero, which has entered into partnerships with professors who contribute resources to the site, has policies requiring students to follow their college's academic integrity rules and prohibiting the unauthorized uploading of copyrighted materials. Grauer said there have been fewer than 100 DMCA (Digital Millennium Copyright Act) takedown requests from Embry-Riddle-affiliated educators over the last several years. (George said that while she does not know the exact number of takedown requests emanating from Embry-Riddle, that number seems low. "I know before I started this project, I could account for at least 50 of those requests," she said.)
"From Course Hero’s perspective, we try to make our platform and the resources on the platform as discoverable as possible. That’s what we’re trying to accomplish, is making accessible our student-generated, educator-generated, tutor-generated resources."
Researchers studying academic misconduct say tools like CourseVillain would be useful in combating concerns about the growing use of Course Hero and other sites like it that encourage students to submit course-related documents or specific course-related questions and that promote themselves as study aids. Such sites have legitimate uses but can also be used to facilitate student cheating if, for example, students post quiz or test questions or entire exams.
Camilla Roberts, president of ICAI and director of the Kansas State University Honor and Integrity System, said academic integrity offices have seen a big increase in student use of popular “homework help” sites like Course Hero and Chegg since classes moved online due to the pandemic.
“In talking with just faculty at my institution, I think they’re looking for ways to figure out how can we stay ahead,” Roberts said. “I’ve had professors tell me they just don’t have time to look for everything.”
Thomas Lancaster, a senior teaching fellow in computing at Imperial College London, has conducted research showing a sizable increase since the start of the pandemic in the volume of questions posted by students on Chegg. He attributes the rise in questions to a likely increase in the number of students using the site to cheat. (Chegg, which advertises that questions submitted by students will be answered by an "expert" in as little as 30 minutes, has objected to the implication that the increased usage of the site is correlated with an increase in cheating.)
“If we want to tackle student cheating through sites like Course Hero, then tools like CourseVillain are going to be really important for doing that, because the volume of work that appears on those sites is so high it's nearly impossible to do any type of tracing by hand,” Lancaster said. “We need those extra resources, we need the automation, we probably need some artificial intelligence linked with that.”
Lancaster added that the CourseVillain model may not translate as well for other sites that don’t link student submissions to a specific university or course.
“From what I’ve seen about CourseVillain, it works really well when students are honest and they say what university the document is from and they put in information that makes it traceable, but that model doesn't translate to all the other services out there,” Lancaster said.
Ethan Fieldman, CEO of the educational technology company Study Edge and co-director of Math Nation, which provides middle and high school math curricula, said Chegg "is a whole other ballgame." He said students using the site often remove the names, logos and images of the schools they attend, making the content harder to search.
Fieldman said Study Edge has developed a similar automated search and takedown tool, an "anticheating bot," focused on Chegg. It’s currently being used as an internal tool to check Chegg for questions in Math Nation's own database of math problems, but the plan is to make the search tool more broadly available for purchase by colleges, K-12 school districts and individual educators soon. (He anticipates the price will be $4,000 a year for institutions, and $20 a month for individual educators.) A professor could upload a test with 40 questions once and the bot will repeatedly search for those questions on Chegg and alert the professor if a question appears on the site weeks or months down the line. Professors would then have the option of ignoring the question or submitting a takedown request.
"It’s not for us about the copyright," Fieldman said. "What we care about is students are able to cheat, that they can find the answers to our assessments on Chegg, and that’s a problem."
Candace Sue, head of academic relations for Chegg, said in a written statement that the company invests resources in preventing students from cheating, including through the launch of Honor Shield, which blocks exam questions during designated exam periods. The program is free for any American professor to use.
"Students need help and the overwhelming majority of Chegg users are hardworking and honest, and they use our platform to supplement their learning," Sue said. "While we are not aware of this specific tool, we take any attempts to cheat by those who abuse our offerings extremely seriously. Efforts to curb academic dishonesty are critical, and we invest time and resources to prevent misuse of our learning platform in support of the students who are here to genuinely learn their course materials. In the last eight months, we have significantly increased our staff, investment in technology, infrastructure, and tools to prevent misuse of our platform."