Every minute, countless new videos flood platforms like YouTube, TikTok, and Instagram, with live streaming becoming increasingly prevalent. Despite this explosion of visual content, technology and media companies face significant challenges in effectively analyzing and understanding this vast ocean of information.
Enter Netra, an innovative company founded by MIT alumni, which is harnessing the power of artificial intelligence to transform video analysis at an unprecedented scale. Their cutting-edge system can recognize activities, objects, emotions, locations, and numerous other elements, revolutionizing how videos are organized and contextualized.
Organizations leveraging Netra's groundbreaking solution can automatically group similar content into highlight reels or news segments, identify inappropriate material such as nudity and violence, and optimize advertising placement. In the advertising realm, Netra's technology enables brands to move away from controversial individual tracking methods by ensuring videos are paired with contextually relevant advertisements.
"The industry is collectively shifting toward content-based advertising, or what we call affinity advertising, moving away from the somewhat invasive cookie-based and pixel-based tracking approaches," explains Shashi Kant SM '06, Netra's co-founder and CTO.
Beyond advertising, Netra is dramatically enhancing the searchability of video content. Once processed by Netra's sophisticated system, users can initiate searches with keywords, then click through results to discover similar content and pinpoint increasingly specific events within vast video libraries.
For example, Netra's technology can analyze an entire baseball season's footage and help users locate every single hit. By selecting specific plays to find similar content, they can even identify all the singles that narrowly resulted in outs, triggering passionate fan reactions.
"Video represents today's most significant information resource," Kant asserts. "It surpasses text by orders of magnitude in both information richness and volume, yet search capabilities for video content remain largely untapped. It's truly the last frontier of information organization."
Pursuing a Vision
Internet pioneer and MIT professor Sir Tim Berners-Lee has long championed improving machines' ability to interpret internet data. Kant had the privilege of researching under Berners-Lee during his graduate studies, drawing inspiration from his vision for enhancing how information is stored and utilized by machines.
"The holy grail for me is establishing a new paradigm in information retrieval," Kant shares. "I believe web search remains in its 1.0 iteration. Even Google represents 1.0. This has been the driving force behind Sir Tim Berners-Lee's semantic web initiative, and it's what I took away from that experience."
Kant also participated in the victorious team of the MIT $100K Entrepreneurship Competition (then known as the MIT $50K). He contributed to developing the computer code for the Active Joint Brace, an innovative electromechanical orthotic device designed to assist individuals with disabilities.
Following his graduation in 2006, Kant established a company called Cognika that integrated AI into its solutions. During that era, AI carried significant stigma due to overhyped promises, compelling Kant to use alternative terminology like "cognitive computing" when presenting to investors and potential customers.
In 2013, Kant founded Netra with a focus on applying AI to video analysis. Today, he navigates the opposite challenge—an oversaturated market of startups claiming AI integration in their solutions. Netra distinguishes itself through compelling demonstrations of its system's capabilities.
Netra's technology can rapidly analyze videos and organize content based on activities within different clips, including scenes featuring similar actions, emotions, product usage, and more. While Netra's analysis generates metadata for various scenes, Kant emphasizes that their system delivers far beyond simple keyword tagging.
"We work with embeddings," Kant explains, describing how his system classifies content. "When there's a scene of someone hitting a home run, it possesses a distinctive signature that we capture through an embedding. An embedding consists of a sequence of numbers, or a 'vector,' that encapsulates the essence of content. Tags merely represent human-readable versions of these embeddings. We train models to detect specific elements like home runs, but beneath the surface, a neural network creates an embedding that differentiates the scene in numerous ways from other plays like outs or walks."
By defining relationships between different clips, Netra's system enables customers to organize and search their content in revolutionary ways. Media companies can identify the most exhilarating moments of sporting events based on spectator emotions. They can also categorize content by subject matter, location, or the presence of sensitive or disturbing material.
These capabilities have profound implications for online advertising. An advertising agency representing a brand like outdoor apparel company Patagonia could utilize Netra's system to place Patagonia's advertisements alongside hiking-related content. Media companies could offer brands like Nike premium advertising space surrounding clips featuring their sponsored athletes.
These functionalities are helping advertisers comply with new privacy regulations worldwide that restrict data collection on individuals, particularly children. Additionally, targeting specific demographic groups with ads and tracking their online behavior has become increasingly controversial.
Kant views Netra's AI engine as a significant step toward giving consumers greater control over their data—a concept long championed by Berners-Lee.
"While Netra doesn't directly implement my CSAIL research, the conceptual frameworks I developed at CSAIL certainly manifest in our solution," Kant reflects.
Transforming Information Storage
Netra currently serves some of the nation's largest media and advertising companies as clients. Kant envisions Netra's system eventually helping anyone search through and organize the ever-expanding ocean of video content on the internet. To achieve this vision, he sees Netra's solution continuing to evolve and innovate.
"Search technology hasn't evolved significantly since its inception for web 1.0," Kant observes. "Currently, we primarily rely on link-based search, which I consider outdated. Users don't necessarily want to visit multiple documents—they want aggregated, contextual, and customizable information from those documents, delivering precisely what they need."
Kant believes such contextualization would dramatically enhance how information is organized and shared across the internet.
"It's about decreasing reliance on keywords and increasing dependence on examples," Kant elaborates. "For instance, in a video, if I make a statement, is that because I'm misguided, or is there substantial evidence supporting my position? Imagine a system that could identify, 'This other scientist made a similar statement to validate that point, and this scientist responded comparably to that question.' To me, these capabilities represent the future of information retrieval—that's my life's passion. That's why I came to MIT. That's why I've dedicated fifteen years of my life to advancing AI, and that's what I'll continue to pursue."