Welcome to “This Week in Data”. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Thank you for subscribing, and I would love any feedback to make this newsletter work better for you.
The Timnit Gebru resignation-firing has been a fast-paced news cycle starting on Wednesday night's tweet. Timnit is a respected AI ethics researcher known for founding Black in AI, FAccT conference, and a spokesperson for racial bias in AI. She was let go by Jeff Dean, who preemptively decided not to meet a set of ultimatums she had given to the group. The ultimatum had been by triggered Google’s supposedly unreasonable rejection of her paper criticizing impacts of current language models. Timnit afterwards sent a scathing internal email about diversity efforts at google that broke the camels back. This is what we know:
The details:
The Research Paper - Karen Hao from MIT Technology Review sheds some light on what the paper looked like. It's seems to be the usual critique such as AI bias, high energy consumption for language models, and a critique of correlation over understanding. It is rumored to have been shuttered for being anti-Google, but doesn't seem very radical.
The Emails - Casey Newton (ex-Verge) on Platformer released both Timnit's original email and Jeff Dean's response to the organization. The original email from Timnit to the Google Brain Women and Allies team told people to stop participating in DEI (Diversity, Equity and Inclusion) because of it was all show and no substance.
What this means:
The Chilling Effect - Anna Kramer from Protocol discusses how the firing of an ethics researcher could have a chilling effect on AI Ethics programs and heavily delegitimizes Google's claim to be a leader in Ethical AI.
An Overview of Timnit: Khari Johnson from VentureBeat gives an overview of Timnit's work over the past year in being an icon in AI Ethics and research on AI bias.
The Demands - Google Walkout set up #ISupportTimnit and #BelieveBlackWomen to pressure Google to increase transparency of decision making.
My Personal Thoughts - As with these controversies, the debate gravitates towards the mechanics of the controversy more than the underlying issues that led to this conclusion. (See reddit, twitter, article comments). Challenges w/ DEI (Diversity, Equity and Inclusion) programs are likely not limited to Google, and companies should keep Timnit's email in mind. None of the sides are perfect, but I hope it moves forward with advancements in ethics & diversity in tech.
Data Quality, Discovery, Governance, Catalog, pick your word. As the space heats up both big and small players attack the problem from a variety of angles. I suspect in coming years there will be combination and consolidation of efforts to make sense of your data. Here's some updates and posts from just this week.
Data Classification. Microsoft releases Azure Purview focus on data classification and protection, alongside a comprehensive catalog product.
ML Pipelines. Amazon releases SageMaker Data Wrangler for AI pipeline management.
Data Catalog. Mark Grover (Amundsen PM) team starts Stemma to Amundsen. Here's the intro blog post.
Data Quality. Jeremy Stanley (Anomolo) piggybacks on Airbnb's data quality blog posts to describe how their team thinks about implementing data quality checks as a product.
Just Thoughts. Tristan Handy (dbt, Fishtown) shares thoughts about how data governance will be a significant piece of the "second cambrian explosion" of the modern data stack.
Small Bytes
Google DeepMind's AlphaFold II achieves nearly 90% accuracy for 3D protein folding based only on the amino acid sequence. The AlphaFold team had won the CASP competition 2 years ago at 55%, but managed to achieve accuracy now equivalent to experimental methods. Lots of tech science is buzz, not this one. (Other articles: Nature, VentureBeat)
AI Art is now being created by the French Art collective Obvious. The most recent print is a mashup of cave art in Lascaux and a german graffiti artist, priced at €250 for 50 limited edition prints.
The Adobe team talks about Iceberg, its real time data platform for Adobe Experience Platform. It's a pretty standard end-to-end data ecosystem, but it gives insight into design decisions across the architecture.
Bloomberg has a series of visualizations for vaccine distribution. It has an overview of available vaccines, a cute little Sankey on where vaccines are going to be distributed, and an overview of what we can expect to see in coming months.
Our World in Data has a piece on renewables, and how they are getting cheaper over time. It shows how Wind, Solar have gotten better economics while Coal and Nuclear have not followed learning curves.
Industry and Fundraising
Ultimate.ai - $20M for conversational AI
Materialize - $40M for stream materialized views (HN discussion)
Genesis Therapeutics - $52M for AI drug discovery
Flock Freight - $113.5 for algorithmic truck pooling logistics
Scale AI - $155M for $3.5B valuation for data labeling for AI
ServiceNow - acquiresElement AI for enterprise AI capabilities
“This Week in Data” is a weekly newsletter to help you stay up to date with developments in the data ecosystem. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Topics include infrastructure, AI/ML, experimentation, analytics/BI, privacy, security.
(Was this forwarded to you? Subscribe)