Issue 28: This Week in Data (Feb 24, 2020)
(1) Data breaches hit California DMV & Jamaican Immigration vendors (2) Perseverance lands on Mars with AI-assisted software (3) Google AI Ethics group reorganization following the Timnit saga.
Hi TWID,
Hope you all had a great week. With the Texas snows and March coming soon, it’s pretty crazy how time flies. And in case you missed it, stock market data is data too. Here’s the week in data:
Data Security
Interconnected Data = Interconnected Data Hacks
Now that we have an interconnected data community, it's not just a one-to-one relationship between hacker and hacked. It's not just phishing internal networks for credential stuffing (brute force testing breached user/password lists on multiple websites to take advantage of password reuse). It's all getting a little complicated. Especially as government outsources software development the private sector, the lowest common denominator of negligence has wide-ranging effects. This week, we have updates on…
DMV Vendors. The California DMV (Department of Motor Vehicles) vendor AFTS (Automatic Funds Transfer Service) was hit by data breach by a Cuban Ransomware group. The DMV has changed their service provider, but it's unclear the extent of the damage.
COVID Vendors. Jamaican immigration website exposed travelers data through what seems to be a public AWS server. Later in the week, it was found that private keys and passwords were exposed in an environment variable file. Negligence is expensive.
"It’s not known for how long the data was unprotected, but contained more than 70,000 negative COVID-19 lab results, over 425,000 immigration documents authorizing travel to the island — which included the traveler’s name, date of birth and passport numbers — and over 250,000 quarantine orders dating back to June 2020.
Multi-level Data Extortion. Accellion's Legacy File Transfer Application was hacked via zero-day, in Jan. It was supposedly fixed, but earlier this month a number of Accelion's clients were subject to extortion or had their data dumped in the dark web. The extortion chain from Accellion → Law Firm → Law Firm's Client shows how dangerous a single breach can be.
This news comes on the tail of the Solarwinds hack, from which we seem to discover more affected surfaces every day. Even though that one is not immediately data related, we will keep our eyes peeled to see how data exfiltration ends up impacting geopolitics both big and small.
Tech Companies
From Infrastructure to Ethics
Across the board, we have the medium/large tech companies grinding to build infrastructure that works for increasingly expansive applications and use cases. It’s always fun, and sometimes useful to see what others are up to. Here are the latest blogs:
Stream Processing. Lyft talks about how they fight data skew in stream processing through an improved distributed logic, improved monitoring, salting, and improved reshuffling.
Schema-Agnostic Logging. Uber talks about their new Clickhouse-based log system designed to replace their 2014 ELK pipeline.
Data Governance and Security. Salesforce talks through governance and security features they've built into their Hadoop ecosystem. Topics include Network security, Kerberos (SSO authentication), RBAC and least privilege access, and more.
Experimentation. Squarespace talks about their A/B testing culture, setup, and technology. Not revolutionary, but some screenshots and content can help give ideas on how to improve your experimentation processes.
Data Quality & Sales Insights. LinkedIn talks about releasing its new sales insight data product to help its customers get better data. The blog focuses on how they maintain data quality for a production application based on data pipelines.
As for the big companies, they get to be in the news around data ethics. Once you have a well-oiled machine, what you do (and don't do) with has become the point of contention.
Ethics in AI. Google is trying to enter its next phase in the Timnit Gebru saga by restructuring the Ethical AI team under VP Marian Croak. We will see if the new leadership will be able to right the ship and regain the confidence of employees in the group.
Ethics in Advertising. Facebook is under fire for knowingly over-reporting its "potential advertising reach" metric where it didn't deduplicate against fake accounts. This news resurfaces as discovery from the 2018 lawsuit reveals new emails.
Small Bytes
Map of Power outages in Texas. As the power outages started, we could see a map of how each region was affected. As of this newsletter, it looks like power is mostly back for all customers, but wishing Texas the best.
LAPD used Ring to surveil BLM protests earlier this year. The police department sent requests to Ring users for video footage captured from their doorbell cameras. These requests were facilitated through Amazon, and the acts are a scary trajectory for these private videocameras.
Vietnam is drafting a data privacy law that emulates several other privacy laws around the world.
Protocol interviews an anonymous Bytedance employee about their TikTok censorship machine, which employs over 20,000 content moderators. This interview from the guilt in their participation in shutting down Dr. Li Wenliang and Uyghur language monitoring.
One thing I found interesting was some weird perversion of the Section 230 legal liability having a parallel with government liability. "What Chinese user-generated content platforms most fear is failing to delete politically sensitive content that later puts the company under heavy government scrutiny."
The Mars Rover Perseverance landed last week, and is loaded with technological improvements from robotics and AI. From landing to navigation, it's crazy to see mars robot competitions in the 90s materialize to what we have today.
Corpo Updates
Apache Kafka removing zookeeper dependency and updating features in ksql 0.15
Twilio now supports webhooks for Event Streams
Grafana adds a new Logs functionality to expand their monitoring ecosystem
Industry and Fundraising
Edgybees - $9.5M for AI for augmented reality in drone footage
AdmitHub - $16M for chatbots for enrolling and staying in schools.
Photomath - $23M for helping students solve math problems with CV
Boast.ai - $100M for helping startups secure R&D tax credits
Rcogni - $48.9M for AI perception chips for self-driving cars
Personetics - $75M for personalized bank advice
TigerGraph - $105M for large scale graph database
Locus Robotics - $150M for warehouse robotics platform
This Week in Data is a weekly newsletter to help you stay up to date with developments in the data ecosystem. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Topics include infrastructure, AI/ML, experimentation, analytics/BI, privacy, security.