This Week in Data (July 17, 2020)
Issue 1 | July 17, 2020
Welcome to “This Week in the Data”. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Thank you for signing up! I’m still figuring out the format and tone of this newsletter, and would love any feedback.
New GPT-3 AI generates more dad jokes, realistic poetry, and HTML layouts. Last month, OpenAI released GPT-3, a generative AI that produces beautiful prose and eerily human writing. In past weeks, creative individuals such as Gwern have generated a broad selection of high quality dad jokes, prose and Edgar Allen Poe poetry. Another fun one is Arram's hilarious Elon Musk Dr. Seuss poetry. Even though the AI shows some weakness at non-sensical questions (such as “how many eyes does my foot have?”), it has a brilliant knowledge base in multiple languages and can even create code or HTML layouts. OpenAI had announced the release this last month via beta waitlist and with plans to provide API services as a commercial product.
Uncertainty in EU/USA data transfer agreements as Privacy Shield is struck down. On July 16th, 2020, the courts ruled that EU citizens weren't sufficiently protected from US gov't surveillance via the "Schrems 2.0" case. US companies that use Privacy Shield as a method to satisfy the data transfer requirements of GDPR may have to rethink their strategy, either through SCCs or potential new laws. Amidst numerous articles with unintelligible legalese, I found this great article explaining PS/SCC/etc from when the case started little over a year ago.
Sparklines
Athena now supports querying Apachi Hudi datasets. Hudi is a recently open sourced project from Uber that smoothly enables incremental processing and transformations.
DataViper, an aggregator of hacked data, was hacked. The startup sold access to a database of 15bn records collated from 8000+ website breaches acquired throughout the dark web.
Dropbox shows a creative use of “snapshot testing” on mobile event logs to ensure tracking quality. This helps developers quickly identify double-logging and unreadable tracking, and build confidence that feature releases don't introduce regressions in analytics.
EFF Launches Database of Police Agencies using surveillance tech. The privacy organization launched their Atlas of Surveillance map visualization that shows police use of high-tech surveillance apps. Takeaway: Florida is a hotbed of facial recognition software

Fundraising & Industry
Abacus.AI (formerly RealityEngines.AI) raises $13M Series A for deep learning as a service.
Snorkel.AI exists stealth for rule-based dataset tagging.
VentureBeat Transform 2020 AI Innovation awards have been announced for categories of NLP, Business Applications, CV, and AI For Good. Women in AI winners to be announced this week.
“This Week in Data” is a weekly newsletter to help you stay up to date with developments in the data ecosystem. My goal is to bring focus on broader data trends to data professionals and enthusiasts who are interested in data and its applications. Topics include infrastructure, AI/ML, experimentation, analytics/BI, privacy, security.
(Was this forwarded to you? Subscribe)
