• How Data Teams Are Using Vector Embeddings for Semantic Search
    Jun 30 2026
    Episode 83 of The Data Business Podcast dives into the practical uses of vector embeddings for semantic search in enterprise data environments. Lucas and Luna explore how companies like Shopify have leveraged embeddings to power product discovery and internal knowledge retrieval, reducing search-to-purchase time by 12 percent. They break down the technical trade-offs between dense and sparse embeddings, the cost of storing high-dimensional vectors, and why data teams are now embedding everything from customer support tickets to internal documentation. With a focus on real-world implementation details including approximate nearest neighbor algorithms and vector database choices, this episode equips operators and builders with a clear framework for deciding whether semantic search is worth the infrastructure investment. #VectorEmbeddings #SemanticSearch #DataInfrastructure #MachineLearning #Shopify #ApproximateNearestNeighbor #VectorDatabase #Pinecone #Milvus #DenseEmbeddings #SparseEmbeddings #NaturalLanguageProcessing #DataEngineering #BusinessTechnology #SearchOptimization #FexingoBusiness #BusinessPodcast #TheDataBusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    10 mins
  • What Your Data Catalog Still Gets Wrong About Search
    Jun 30 2026
    Episode 82 of The Data Business Podcast. Lucas and Luna explore why enterprise data catalogs often fail at the most basic function: helping people find the right dataset. They drill into the difference between metadata search and semantic search, using the example of a large retailer that spent $2.4 million on a commercial catalog tool only to discover analysts still couldn't find their own tables. The conversation covers the rise of vector embeddings for data discovery, the concept of 'semantic similarity' in column names, and why the next wave of catalog tools are borrowing from LLM-derived embeddings rather than traditional keyword indexes. A practical episode for anyone who has ever typed 'customer revenue' into a catalog and gotten 47 unrelated tables. #DataCatalog #MetadataSearch #SemanticSearch #VectorEmbeddings #DataDiscovery #DataGovernance #EnterpriseData #DataEngineering #LLM #DataObservability #DataLakehouse #DataMesh #Analytics #BusinessIntelligence #DataProducts #DataTeams #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    10 mins
  • How Open Table Formats Are Rewriting Data Lakehouse Rules
    Jun 29 2026
    Episode 81 of The Data Business Podcast. Lucas and Luna explore how open table formats—Apache Iceberg, Delta Lake, and Apache Hudi—are reshaping the data lakehouse landscape. They focus on Iceberg's rise to dominance, the engineering decision at Netflix that kicked it off, and why the format war matters for anyone building a modern data stack. Specific numbers: Iceberg adoption grew from 15 percent of data lakehouse users in 2022 to over 60 percent by early 2026. The discussion covers transactional guarantees on object stores, catalog integration, and the painful migration story at a mid-size fintech that switched from Delta Lake to Iceberg. No fluff, no vendor pitches—just the real trade-offs data teams face today. #ApacheIceberg #DeltaLake #ApacheHudi #OpenTableFormats #DataLakehouse #DataInfrastructure #DataEngineering #Netflix #ObjectStore #ACIDTransactions #DataCatalog #FormatWars #BusinessAndTechnology #DataArchitecture #DataMigration #FexingoBusiness #BusinessPodcast #TheDataBusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    10 mins
  • How Data Teams Are Validating Pipelines with Schema-on-Read
    Jun 29 2026
    Episode 80 of The Data Business Podcast explores how data teams are using schema-on-read validation to catch pipeline failures before they corrupt downstream analytics. Lucas and Luna discuss a real case at a mid-sized e-commerce company where a misclassified field in a Parquet file caused a $200,000 reporting error. They break down the difference between schema-on-write and schema-on-read, explain how tools like Apache Iceberg and Delta Lake enable late-binding schema enforcement, and walk through the trade-offs: flexibility versus performance. The episode also covers how this approach fits into broader data observability and data contract strategies, with practical advice on when to use schema registries like Confluent Schema Registry versus file-level validation. Listeners learn one concrete technique they can apply to their own pipelines. #SchemaOnRead #DataValidation #ApacheIceberg #DeltaLake #DataObservability #DataContracts #DataPipelines #Parquet #SchemaRegistry #DataQuality #DataEngineering #Analytics #EcommerceData #PipelineReliability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TheDataBusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    12 mins
  • How Data Teams Are Using Data Contracts for Cost Allocation
    Jun 28 2026
    Episode 79 of The Data Business Podcast. Lucas and Luna explore how forward-thinking data teams are using data contracts not just for reliability but for precise cost allocation. They dive into a case at a mid-size fintech that cut its cloud data warehouse bill by 28 percent by attaching consumption tags to contract clauses. Lucas explains the mechanics of 'cost-attributed schemas' and Luna questions whether this creates perverse incentives for data producers. The conversation covers implementation gotchas, the role of open table formats, and why this approach beats traditional chargebacks. They also touch on how the practice is spreading from finance to retail and ad tech. A clear, practical episode for anyone running a data platform or paying a six-figure Snowflake bill. #DataContracts #CostAllocation #CloudDataWarehouse #Fintech #DataEngineering #DataGovernance #Snowflake #DataPlatform #FinOps #DataCosts #Chargeback #OpenTableFormats #DataProducers #DataConsumers #Analytics #BusinessPodcast #FexingoBusiness #DataInfrastructure Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    12 mins
  • Why Data Teams Are Using Contract Testing for Data Pipelines
    Jun 28 2026
    Data contracts are getting a lot of attention as a way to enforce schema and quality guarantees between producers and consumers. But a new practice is emerging: applying contract testing — borrowed from software engineering — to data pipelines. Lucas and Luna explore how companies like Monzo and others are using consumer-driven contract tests to catch breaking changes before they hit production. They walk through a concrete example: a finance team's daily revenue report breaks because a source table column gets renamed. With contract testing, the pipeline fails fast during CI, not at 3 a.m. in a Slack alert. The episode covers the tooling landscape (from open-source Pact to dbt expectations), the organizational shift required, and why this approach is especially powerful for data mesh architectures. A practical look at how treating data pipelines like distributed services can reduce downtime and rebuild trust. #DataContracts #ContractTesting #DataPipelines #DataQuality #DataEngineering #Monzo #Pact #dbt #DataMesh #CIForData #PipelineTesting #DataObservability #DataGovernance #Business #Technology #FexingoBusiness #BusinessPodcast #DataBusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    9 mins
  • How Data Teams Are Using Data Clean Rooms for Privacy-Compliant Analytics
    Jun 27 2026
    Episode 77 of The Data Business Podcast explores the rise of data clean rooms—secure environments where companies can join datasets for analytics without exposing raw data. Lucas and Luna dissect a specific case: how a major retailer and a CPG brand used a clean room to measure ad effectiveness without sharing customer-level data. They walk through the architecture, the trade-offs (query performance vs. privacy guarantees), and why clean rooms are becoming essential for compliance with regulations like GDPR and CCPA. Lucas brings numbers from a recent industry report showing a 140% increase in clean room adoption among Fortune 500 companies since 2023. Luna challenges whether clean rooms are a genuine privacy solution or a PR shield. The episode also covers the technical distinction between differential privacy and secure multi-party computation within clean rooms, and why data teams need to rethink their data-sharing contracts. Hosted by Lucas and Luna. #DataCleanRooms #PrivacyCompliance #GDPR #CCPA #DataSharing #Analytics #AdMeasurement #DifferentialPrivacy #SecureMultiPartyComputation #Retail #CPG #Fortune500 #DataContracts #DataArchitecture #PrivacyEngineering #Business #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    9 mins
  • Why Data Teams Are Building Semantic Layers for Business Users
    Jun 27 2026
    In Episode 76 of The Data Business Podcast, Lucas and Luna explore the growing trend of semantic layers — a middle layer between raw data and business tools that lets non-technical users query metrics like 'monthly recurring revenue' without knowing SQL. They examine how companies like Airbnb and Intuit have implemented semantic layers using tools like Looker's LookML and Apache Calcite to reduce data team bottlenecks, improve governance, and speed up decision-making. The episode dives into a real example: how a fintech firm cut reporting turnaround time from two weeks to under an hour by adopting a semantic layer. They also discuss trade-offs like maintenance overhead and the risk of oversimplification. If you're building or running a data-driven organization, this episode offers concrete insights on whether a semantic layer is right for your team. #DataBusiness #SemanticLayer #BusinessIntelligence #DataGovernance #Looker #LookML #ApacheCalcite #EnterpriseData #SelfServiceAnalytics #DataArchitecture #MetricsStore #NoSQL #DecoupledDataStack #DataProduct #BusinessTechnology #AnalyticsEngineering #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    11 mins