DSB is here, Sunday ends and Monday is uncomfortably close. The topics of this volume are overparameterization with two extremely interesting articles in Analytical. Web3 is now a trendy buzzword and everybody writes about it or investing to it as you may read in Business and Career or Computer Science & Science. I would also recommend a link to a Twitter thread on state-of-the-art ML methods suitable for tabular data in Education.
And as always, enjoy your reading.
https://lilianweng.github.io/lil-log/2019/03/14/are-deep-neural-networks-dramatically-overfitted.html – Amazing and detailed article that describes the ability of neural networks with huge number of parameters generelize to out-of-sample data points. (rcmd by reader)
https://www.quantamagazine.org/computer-scientists-prove-why-bigger-neural-networks-do-better-20220210/ – Similar topic, overparameterization of neural networks seems to be mandatory for robustness and maybe consequently for generalization.
https://eng.uber.com/project-radar-intelligent-early-fraud-detection/ – Exceptional article because it’s rather practical than purely theoretical. It introduces the RADAR by Uber that serves as a fraud detection system and combines both machine learning and rules. (rcmd by reader)
Computer Science & Science
https://www.quantamagazine.org/researchers-build-ai-that-builds-ai-20220125 – Hypernetworsk helps tune neural networks. Basically a neural network to tune them all and predict the parameters for the new network.
https://spectrum.ieee.org/ai-chip-design-matlab – What changes are possible in chip design thanks to AI?
https://dev.to/dabit3/the-complete-guide-to-full-stack-web3-development-4g74 – If you are into web3 like almost everybody (see articles in Business and Career) than maybe you would like to know what is hidden under the hood. Read this guide about full stack web3 development. Or you inside find articles on full stack Ethereum dev and others.
Graphs and Visualizations
https://www.datavisualizationsociety.org/report-2021 – Go through this report and see what is the state of data visualization all over the world. Number of respondents is 2164.
https://pudding.cool/2022/02/women-in-headlines/ – On these story-telling visualizations. This one is about women in headlines of the news.
Business and Career
https://www.bloomberg.com/news/newsletters/2022-02-11/the-metaverse-makes-no-sense-and-here-s-why – Metaverses are everywhere, or at least they will be… or maybe not. Maybe they don’t make sense at all. And they are not a new concept.
Polygon raised 450 milion USD with its portfolio of Ethereum scaling solutions.
https://techcrunch.com/2022/02/08/alchemy-which-aims-to-be-the-de-facto-platform-for-developers-to-build-on-web3-raises-another-200m-and-is-now-valued-at-10-2b – Another company betting on web3 is Alchemy with value of 10 bilion USD. So it seems that investors bet on it too.
https://www.comparitech.com/blog/vpn-privacy/countries-netflix-cost/ – How much Netflix costs in different countries? How many shows and movies are provided and what is the cost per title? Those and many other catchy metrics and comparisons in the article. (rcmd by reader)
https://www.notonlycode.org/relearned-typing/ – If you suffer from wrist pain, then you need a splitted keyboard.
I can confirm that it works like magic.
https://nerdlegame.com/ – Almost everybody plays Wordle, but did you try Nerdle? (rcmd by reader)
https://twitter.com/marktenenholtz/status/1490671701884952576 – Tabular data are not that sexy, but in the business you work with them on a daily basis. So read this Twitter thread about best ML methods to use on tabular data.
https://bayesianquest.com/2022/01/03/building-self-learning-recommendation-system-using-reinforcement-learning-part-i/ – First part of the whole series on buidling a self learning recommendation system. (rcmd by reader)
https://analyticsindiamag.com/deepmind-shares-a-list-of-free-ai-ml-resource/ – Curated list by Deepmind of free AI and ML resources categorised into difficulties levels. (rcmd by reader)
Data & Libraries
https://www.reddit.com/r/dataengineering/comments/slolx6/whats_your_data_engineering_stack_at_your_company/ – Reddit thread about data engineering stack at companies.
https://eng.uber.com/how-data-shapes-the-uber-rider-app/ – Uber’s data and their role in the Uber Rider App. From collecting to processing and the ultimate effect on the app. (rcmd by reader)
https://github.com/mljar/mercury – Mercury is a Python library that converts your Python notebook to a web app.
https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/ – The complexity of Kubernetes (K8s). You probably know the word, know about containerization and deployment, but do you really know how deep the rabbit-hole goes? (rcmd by reader)
https://www.shreya-shankar.com/rethinking-ml-monitoring-1/ – Four-part series about ML monitoring covers multiple issuess one needs to solve.
https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html – Impressive article about data distribution shifts and monitoring originaly created for course by Stanford.
Video & Podcast
https://youtu.be/Oo3zlOTbN2E – How to adopt agile in your company, department or team? First you need a really good strategy, otherwise you will fail. This video will provide you one and also describe all the inconveniences that might challenge you. (rcmd by reader)
https://youtu.be/vSnCeJEka_s – Lot’s of companies have agile, but sometimes it’s only a disguised waterfall. What are the typical dysfunctions and what does it mean to truly go full agile. (rcmd by reader)
Papers & Books
https://arxiv.org/abs/0911.4863 – Statistical exponential families.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3867326 – Causal ML and its importance for business.
Behind the Fence
https://recyclist.co/careers/#data – Data Engineer in Recyclist, Truckee, CA, USA.