Skip to content

DSB #109


I think it’s Friday again and DSB is on its way, a bit too late I admit. Try to play the mini game from Graphs and Visualizations or read a long article why we partly failed in modelling of covid-19 in Analytical. But also learn more about data engineering, you gonna need it in near future as you can learn in the first article in Business and Career.

As always, enjoy your reading.

Analytical – Long, like really long reading why it’s been so difficult to model covid-19. – Large generative language models like GPT-2 are vulnerable to multiple types of attacks. Especially in terms of privacy of training data, when it’s possible to extract pieces of the data. (rcmd by reader) – How to monitor data quality at scale.

Computer Science & Science – License battle between Elastic and AWS. AWS is using Elastic as an open-source software for free and Elastic wants money, hence they’re changing their license and AWS is not happy. (rcmd by reader) – Yeah, you should use a virtual environment for your projects (if you can) but there is another option thanks to PDM, Python Development Master, how to manage packages. (rcmd by reader) – Let me introduce a Hotwire when you’re sending html instead of json over the wire and you get fast first-load pages.

Graphs and Visualizations – Who is a programming language inventor and who is a serial killer… try to recognize 😀 (rcmd by reader) – Theory of ML expressed by bad drawings but still interesting reading that summarize many theoretical principles of ML. – Review of density plots and their advantages.

Business and Career – It’s true, we really need more data engineers or at least more data scientists with engineering skills. That’s the real bottleneck. – I don’t like list of trends, most of the time they’re lazy, but this one looks reasonable – 5 data trends for CDO in 2021. – Apple allows people to disable tracking but ask about it only in case of third parties’ apps and Facebook strongly disagrees.

Pop – Honest and raw opinion about ML, and why at least in academic research it is almost stagnating. – Spotify aims to use your mood, speech and background noses to recommend you adecvate music. Seems terrifying to me. (rcmd by reader) – Vyzkoušejte si, jak byste byli úspěšní při řešení korona krize v ČR. Na závěr se můžete porovnat s ostatními hráči. (rcmd by reader)

Education – How to work with Lambda layers in Keras when you for example to adjust some dense layer. – Free e-book Graph Representation Learning that teaches you about methods for embedding graph data, graph neural networks, and deep generative models of graphs. (rcmd by reader) – Google Cloud offers free online courses with focus on data analytics, AI, machine learning, and cloud services. (rcmd by reader)

Data & Libraries – Do you demand a framework or library for ML in Python? You find it here! A weekly updated list with anything you need. (rcmd by reader) – Sequiter provides autoencoders for sequential data and it’s built on PyTorch. (rcmd by reader)

Video & Podcast – Did you play Pokemon Red and have you ever wondered who is the best NPC trainer in the game? Author of the video took all the trainers and let them battle. Then measured their results via Elo and ordered them into tiers based on Kernel density estimation. Funny and truly interesting data science in practice. (rcmd by reader) – Architecture of ML systems. – Video about Switch Transformers by Google Brain. Some call it the next step after GPT-3.

Papers & Books – Old but gold? Who knows. But papers about lead management seem rare so give it a chance. (rcmd by reader) – Vision-language tasks are another interesting area in deep learning. Microsoft improved its model called Oscar by adding visual features as you read in the paper. Code for Oscar is here. (rcmd by reader) – Embedding with manifold density estimator for recommendation systems by Synerise. (rcmd by reader)

Behind the Fence – Data Scientist in Apple, New York City, USA.


Be First to Comment

Leave a Reply