Issue 30: This Week in Data (Mar 9, 2021)
(1) Google wants to kill third-party cookies with the FLoC (2) Several universities may be using race as a predictor of success. (3) Uber's Go-Explore has learned to play 50 Atari games unsupervised
Hi all,
Hope you are having a great week filled with suspense. Will the British crown survive? Will half court shots in the NBA be the new norm? Will NFTs ever make sense? Ok, on to things we understand, like Data.
Data in the real world
Day to Day Data
Often times we talk about the large scale of data in the abstract, the use of data in organizations, or the insidious nature of data on our lives. But what's also important is how innovation in data systems give us new ways to live and how it affects our day to day.
Data in Consumer Technology
Shopping - We all know delivery robots are coming, so check out the Tortoise (piloting with Albertsons). In other news, the Postmates’ robotics division is spinning off into Serve Robotics.
Athletics - Read about how EXOS uses Intel's 3D Athlete Tracking (3DAT) as a tool to track joint movement in 3D space with video footage.
Beauty - MIT Technology Review does a dive into Qoves Studio's facial assessment tool that does AI explainability for facial features tied to perceptions of beauty.
Games - Uber AI Labs' Go-Explore has learned all the games in the Atari 2600 benchmark that consists of over 50 games. From the paper, "A key observation is that sufficient exploration of the state space enables discovering sparse rewards and avoiding deceptive local optima"
Science and Academia
Chemistry x ML - IBM releases MolGX, a platform for discovering new molecular structures given a variety of chemical properties. You can actually try it yourself! (Note: I studied chemical engineering but no idea how to use this...)
Pharma x Vaccines - Drug stores are using COVID vaccinations as a way to collect consumer data for loyalty programs and marketing. If you get your COVID vaccine at a pharmacy, expect some mailers coming your way.
Universities x Predictive Analytics - Several major universities are using EAB, a third party student risk assessor which seems to use race as one predictor for future success. The Markup goes over why this is dangerous aside from obvious reasons, and outlines the risk of predictive analytics.
Data Privacy
The Privacy Corner
Here's this week in the privacy corner...
After a dry period after California passed its privacy law CCPA in June 2018, states have finally begun to make their way through privacy legislation at a fast pace. The general process is legislative body A → body B → governor → law.
Virginia - Consumer Data Protection Act signed into law by governor, onto implementation.
Utah - Facial recognition law passes the house (unanimous), onto governor desk.
Oklahoma - Computer Data Privacy Act passes the house (85-11), onto senate.
Washington - Washington Privacy Act passes senate (48-1), onto house.
Google has started advocating for their replacement of the third party cookie - the FLoC (Federated Learning of Cohorts). The FLoC ID is “essentially an anonymized summary of your recent activity on the web”.
The EFF (Electronic Frontier Foundation) is strongly opposed to this for a number of reasons, including potential advertising abuse, fingerprinting, risk of unintentional discrimination against protected classes.
Private public...partnership?
Vice reports that the Iowa Air National Guard (132d) used LocateX's commercial location data as part of operations.
Slate puts out a summary piece on the extent in which private app data is being shared with the US government through data brokers.
Small Bytes
DoorDash blogs about Riviera, its new Realtime Feature Engineering Framework that uses a SQL-based DSL backed by Flink. (Disclaimer: I work here.)
NYTimes puts out a data visualization on how melting ice affects the Gulf Stream, which is interactive, fun, and a bit scary. Weather simulation is one hell of a data intensive application, and we might see more of it moving forward.
Tom Cruise deepfake impersonator says "Actually it takes a really long time and lots of expertise to make these". Still spooky, but we sleep knowing most deepfakes are going to be crap or we can use it to fool facial recognition.
Facebook releases a new self-supervised computer vision model called SEER. It was trained on a billion instagram pictures, and aims to be performant and dynamic with no annotations necessary.
Pinterest blogs about its content moderation tech stack. It uses a combination of ML models based on an offline Spark tag/train process, and an online representation of this model for real-time predictions.
Netflix data scientists talk about a day in the life of experimentation. Always a nice slice of life to really humanize the people behind the work.
Tim Wu is joining Biden's National Economic Council. The anti-trust professor from NYU is predicted to take aim at big tech.
Uber blogs on its ML-powered internal audit pipeline for 3rd party intermediaries that work with Uber to execute tasks.
Corpo Updates
Tableau has entered the third party data repository space, taking a page out of Snowflake's book on data marketplace.
Domino blogs about how to define metrics to demonstrate success in data science models and products.
Amazon releases a Reinforcement Learning toolkit to support AWS Robomaker for robot workflows.
Industry and Fundraising
SnapCommerce - $85M for AI powered messaging for ecommerce
Kvantum acquired by Yum Brands for marketing intelligence
Precisely acquired by Clearlake Capital for IBM mainframe integration solutions
This Week in Data is a weekly newsletter to help you stay up to date with developments in the data ecosystem. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Topics include infrastructure, AI/ML, experimentation, analytics/BI, privacy, security.