TalkMiner: A Search Engine for Online Lecture Video John Adcock1 Matthew Cooper1 Laurent Denoue1 Hamed Pirsiavash2 Lawrence A. Rowe1 1 2 FX Palo Alto Laboratory Dept. of Computer Science 3400 Hillview Ave. University of California, Irvine Palo Alto, CA 94304 USA Irvine, CA 92697 USA {last name}@fxpal.com

[email protected]

ABSTRACT TalkMiner is a search engine for lecture webcasts. Lecture videos are processed to recover a set of distinct slide images and OCR is used to generate a list of indexable terms from the slides. On our prototype system, users can search and browse lists of lectures, slides in a specific lecture, and play the lecture video. Over 10,000 lecture videos have been in- dexed from a variety of sources. A public website now allows users to experiment with the search engine. Categories and Subject Descriptors H.5.1 [Information Interfaces and Presentation]: Mul- timedia Information Systems General Terms Algorithms, Experimentation Figure 1: Viewing slides and using them to seek the 1 Overview embedded player. Slides matching the user-specified search terms are highlighted. Lecture webcasts are readily available on the Internet. These webcasts include class lectures (e.g., Berkeley Webcast, MIT even constraints on the style of the video capture. Thus, Open Courseware, etc.), research seminars (e.g., Google Tech the system can scale to include a greater volume and variety Talks, PARC Forums, etc.), product demonstrations, or train- of existing and newly created content at a much lower cost ing materials. than would otherwise be possible. Also, by leveraging ex- Conventional web search engines can be used to locate isting online video distribution infrastructure to embed we- these lecture videos, but only when supporting text appears bcasts within an enhanced interface the system minimizes on the hosting web page, or the media has been tagged or storage and bandwidth requirements, further aiding scala- otherwise authored within a purposed hosting system. But bility and portability. users, especially students, need to find the locations within The system currently indexes lecture videos from three a video when an instructor discusses a specific topic. Ad- sites: YouTube [7], U.C. Berkeley [1], and blip.tv [2]. The dressing this need requires a search engine that can identify current number of talks indexed is over 11,000. relevant material with the content of the webcast. TalkMiner provides an enhanced search and browsing sys- 2 System Architecture tem designed to improve the usefulness of lecture webcasts by enabling keyword search of slides appearing within a pre- An overview of the architecture appears in Figure 2. The sentation video, and video playback directly from the time a system is composed of two main components: the back- chosen slide appears (see Figure 1). The system analyzes the end video indexer and the front-end web server. The video videos to identify and extract unique slide images along with indexer searches the web for lecture webcasts and indexes their time codes. This allows seeking the embedded video them. It automatically identifies slide images and processes to the appearance of each detected slide. OCR is applied to them to create the text search index. detected slide images and a text search index is built from The method for identifying and accessing the presenta- the recovered words. tion media varies slightly for each source site, but generally TalkMiner builds its search index and interface from com- videos are identified by parsing RSS feeds to which the sys- monly recorded video. It requires neither dedicated lecture- tem is subscribed. Once downloaded, it takes roughly 6 capture systems, nor careful post-capture authoring, nor minutes to process a 60 minute talk (i.e., 10 percent of the real time talk duration), so the system is generally limited by download speed rather than processing speed. Copyright is held by the author/owner(s). MM’10, October 25–29, 2010, Firenze, Italy. The TalkMiner web-based front end is implemented in ACM 978-1-60558-933-6/10/10. Java server pages (JSP) running on an industry standard Figure 2: System architecture showing separate Figure 3: Search results. Each lecture shows a back-end, front-end, and content hosting compo- representative keyframe, attribution, title, and de- nents. scription when available. Results can be filtered or sorted based on metadata including the date, dura- Apache/Tomcat combination. The indexing and search fra- tion, number of slides, and source of the webcast. mework, implemented with DBSight [4], runs in the same Tomcat instance. At runtime, searches are performed thr- ments of a minimum length; these generally correspond to ough the DBSight web application from previously comput- slides. We have extended this baseline approach to ad- ed Lucene indexes of the MySQL talk database to render dress several frequently occurring cases in which it often the search results lists. The detailed talk page reads the selects spurious keyframes, or misses slide appearances alto- detailed slide text and timing information directly from the gether. MySQL database. Embedded player control and within-talk text searches are performed in javascript, freeing the system • full-frame shots of the speaker with neither navigational from futher web or database access until the next search. or informational value. • shots of slides that contain “picture-in-picture” streams. 3 User Interaction • shots from the back of the room that include the audience The TalkMiner search interface resembles typical web search in the foreground and/or the speaker. interfaces. The user enters one or more search terms and a To improve our slide detection we have incorporated spatial list of talks that include those terms in the title, abstract or filtering to ameliorate the effects of insignificant motion, and the presentation slides are listed as shown in Figure 3. self-bootstrapping visual models to filter out shots of the The information displayed for each talk on the search re- speaker. sults pages includes a representative key frame, the title of the lecture, and the channel or source of the talk. Other 5 Copyright Considerations metadata displayed includes the duration of the talk, the Video on the web exists under a wide variety of copyrights number of slides, the publication date, and the date it was and terms of use and the implementation of TalkMiner has indexed by TalkMiner. An attribution and link to the orig- taken this into account. University lecture material usu- inal video source is also provided. ally has an explicit Creative Commons [3] license, but even The user can browse a list of talks and alter the sorting these vary in their scope, in particular not always allowing and filtering criteria for the listing. By default, talks are for the creation of derivative works which puts various de- sorted by relevance to the query terms as computed by the sirable modifications on shaky legal ground. The potential Lucene text retrieval software [6]. Other available sort at- value of TalkMiner is much higher when using content with- tributes include publication date, number of slides, channel, out copyright restrictions, such as would be the case if the and rating. The first column on the left side of the results system were deployed by the copyright holder proper. page includes interface controls to filter results according to specific criteria (e.g., the year of publication, the channel, 6 References etc.). It also includes a list of recent search queries to allow users to re-execute a recent query. [1] Berkeley Webcasts. http://webcast.berkeley.edu/. Search results link to the detailed talk view as depicted in [2] Blip TV. http://www.blip.tv/. Figure 1. Slides matching the query are highlighted, and the [3] Creative Commons. http://creativecommons.org, 2007. user can control the playback position of the embedded video [4] DBSight. http://www.dbsight.net/. player by selecting the slide thumbnail with the content of [5] L. Denoue, D. Hilbert, D. Billsus, and M. Cooper. Projectorbox: Seamless presentation capture for classrooms. interest. In World Conf. on E-Learning in Corporate, Government, Healthcare, and Higher Education, 2005. 4 Slide Detection [6] Apache Lucene. http://lucene.apache.org/java/docs/. We adapted a straightforward frame-difference based anal- [7] YouTube. http://www.youtube.com/. ysis from ProjectorBox [5] to identify keyframes for Talk- Miner. We extract a keyframe for nearly stationary seg-