Issue 31: This Week in Data (Mar 24, 2021)
(1) MIT Tech review chronicles Facebook's struggle to do responsible AI. (2) Automatic License Plate Reader ball starting to roll (3) GPT-3 pickup lines and twitter deepfakes become the norm.
Hi all,
Hope everyone is doing well in a tough few weeks. As an Asian-American, it is defeating to see both actions that happened in recent weeks and reading about data on AAPI hate crimes in the past year.
In more personal news, apologies for missing last week (was moving cross-country), so this week’s issue is a double decker.
Data Privacy
Automatic License Plate Readers and More
WashPo does an overview of the Automatic License Plate Reader (ALPR) scene that's a great read on how ALPR has become increasingly pervasive in law enforcement as a tool to circumvent traditional search warrants.
This technology is advertised to help track stolen cars or flag cars that enter a specific neighborhood.
However, privacy groups maintain that collecting everyone's information by default is a huge infringement on individual privacy rights.
As the EFF puts it, "With just a few keystrokes, police can search the historical travel patterns of a vehicle or identify vehicles that visited certain locations."
New proposed California legislation from Scott Weiner (SB 210) aims to restrict ALPR data from being misued by law enforcement.
In other data privacy news we have the usual roundup of ways new cool technologies inadvertently (or intentionally) chip away at our individual privacy both through corporate and government mechanisms.
Security Cameras. Verkada was hacked, and hackers gained access to its 150,000 camera fleet for its security product. The hacker had gained access to a Super Admin account - investigations revealed insufficient controls for these administrative accounts with highly escalated privileges.
Prescriptions. California DOJ aims to block Walgreens' access to state database of patient prescriptions. COVID has been a golden opportunity for companies to quickly gather data under the chaos of the situation, but the government is starting to draw the line.
Post-mortem Privacy. Alexandrine Royer from the Montreal AI Ethics Institute writes about the ethical impacts and lack of legal privacy recognition for individuals who are deceased in wake of Microsoft's news of a chatbot emulating deceased individuals.
Military x Cars. Vice reports on how Surveillance contractor Ulysses advertises product that geolocates vehicles around the world by buying car sensor data and combining it in their database. Ulysses then sells these capabilities to military
Myanmar x Military x Faces x Cars. Myanmar has rolled out an extensive surveillance system equipped with facial recognition and license plate recognition. This ostensibly empowers the military junta that has taken control and is shooting protestors.
Faces. Kashmir Hill from the NYT recaps her past year’s reporting on Clearview AI and the risks this facial recognition app (that comes from scraping social media) has on society.
AI Responsibility
AI Responsibility is social, ecological, and global
AI is powerful, levered, and often in the hands of a few key decision makers. For decision makers the weight of ethics is heavy, stressful and often a complex, uphill battle. I don’t envy the decision makers, but one way to help bring awareness to AI ethics overall is to tell their stories and build a public awareness of its complexity and nuance.
Facebook is the big one these days. Karen Hao from the MIT Technology Review puts out a big piece profiling the failure of AI responsibility initiatives at Facebook. It chronicles the struggle of a well-respected leader within the organization to get AI ethics prioritized over relentless growth (hint: growth wins).
Another part of responsibility is AI & the earth. AI both helps and contributes to climate change, notes the Montreal AI Ethics team as it comes up with some recommendations on how to think about this conflict. Three things it establishes as important are level-setting, understanding potential bias, and understanding the energy impact of increasingly complex AI models.
Finally, AI ethics is also top of mind in the 2021 AI Index Report compiled by the Stanford HAI (Human Centered Artificial Intelligence Group). It is yet another 140 page book but worth a skim.
Small Bytes
50 year old Japanese biker reveals his twitter was a deepfake/faceswap all along. The biker posed as a young woman traveling Japan in motorbikes. The twitter started back in Aug 2020, and the author admits " I get as many as 1,000 'likes' now, though it was usually below 10 before...I got carried away gradually as I tried to make it cuter."
Janelle Shane shows us some GPT-3 Pickup Lines as part of her AI Weirdness blog as a promotion to her book. Our favorite: "I love you. I don't care if you're a doggo in a trenchcoat."
Photoshop adds AI tool for high quality image upscaling. The new Super Resolution tool is advertised to work well for 4x'ing the pixels (2x2).
Amazon's S3 (Simple Storage Service) turns 15 years old. Tom Krazit from Protocol writes a digestible piece on its history and how it got to the critical service it is today.
Intel to invest $20B in two fab plants in Arizona. These fabrication plants expand their Chandler, AZ campus and will primarily work on edge process nodes.
Washington Post produces an interactive visualization of COVID over the past year. with snippets of stories to remind us just how different and crazy each month was. An awesome use of data and storytelling combined.
Corporate Blogs
Data Culture [at] Uber. The data team blogs about scaling challenges outside of just technology. This covers topics across source of truth, discovery, disconnected tooling, process, and ownership.
Flink [at] Pinterest. The team discusses the architecture, build and deploy process (Bazel → YARN), as well as the challenges and next steps the team is tackling for Flink at Pinterest.
Data Labeling [at] Instacart. The team discusses the 7-step preflight checklist before crowdsourcing data labels through “mechanical turk”-like services - in this case grocery product labeling.
Flyte [at] Lyft - The ML orchestration tool Flyte joins Linux Foundation AI & Data (LF AI), following the previous entry of Amundsen. A great case story for the growth of an OSS project and the journey moving forward.
ML Feature Serving [at] Lyft - The company blog does a double-header with a post on their ML architecture for their Feature Service. It has a diagram that is pretty consistent with all feature services, but its always a good read to see what developers focus on and care about.
Data Pipelines [at] Samsara - the data team talks about their AWS Step Functions (SFN) based DAG system that builds on a Spark / Delta Lake ecosystem with a DynamoDB coordinator. An example of home-grown pipeline that doesn't use Airflow/Dagster/Prefect/Oozie.
Cubes [at] Yahoo - the team open sources Cubed, a "data mart as a service" platform that essentially tries to do an end-to-end data workflow that ends up in a Druid OLAP db that's accessible by BI tools.
Industry and Fundraising
TactileAI - $1M for consumer data analytics for beauty companies
TerminusDB - $4.3M for knowledge graph database
Language I/O - $5M for realtime language translations
OctoML - $28M for ML deployment optimization using Apache TVM
Shelf Engine - $41M for AI powered sales prediction and inventory management
Flex Logix - $55M for FPGAs
Aqua Security - $170M for container infra security
Orca Security - $210M for cloud tooling security
Ouster IPO via SPAC as a maker of Lidar devices
Peloton acquires Aiquido Inc. for voice assistant technology
Optimizely acquires Zaius to get a customer data platform
Talend sold to PE Thoma Bravo for $2.4B as a data integrator
Baidu's Kunlun raises undisclosed amount at $2b val. for chip manufacturing
This Week in Data is a weekly newsletter to help you stay up to date with developments in the data ecosystem. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Topics include infrastructure, AI/ML, experimentation, analytics/BI, privacy, security.