Sonal is working on an open-source entity resolution tool Zingg that can link and deduplicate entity data to build a single source of truth. Zingg unifies data at scale using Spark and ML. https://github.com/zinggAI/zingg
Prior to Zingg, Sonal ran a boutique data science and data engineering consulting focusing on Spark, ML, and Cloud. Sonal was also part of the Program Committee for Strata Data and AI. She is a repeat speaker at Spark Summit, Strata, DataCon LA, and other leading data and AI conferences. She holds a BTech from IIT Delhi and has 23 years of industry experience.
Open Source Entity Resolution Using Zingg
Real-world data is far from perfect. It often contains multiple records belonging to the same entity (e.g., customer, property, etc.). These records can come from multiple systems and with variations across different attributes. This makes it hard to combine them together, especially with growing data volumes. This talk will describe Entity Resolution, which is a technique to identify data records in a single data source or across multiple data sources that refer to the same real-world entity and to link the records together. In Entity Resolution, strings and other attributes that are nearly identical, but maybe not exactly the same, are matched without having a unique identifier. We’ll discuss the different challenges to Entity Resolution and how we can leverage Zingg open source to resolve customers, suppliers, products and parts.