long time no see, but on this Sunday you get a DSB volume full of interesting reading. I would recommend an article from Pop that is trying to explain ML via laws of thermodynamics. Also do not skip the MLOps one about σ-driven project management.
And as always, enjoy your reading.
https://explosion.ai/blog/bloom-embeddings – Are your vectors too big to fit? Try probabilistic data structures, authors call it a cheat, but it works. Very dense reading.
https://sebastianraschka.com/blog/2022/confidence-intervals-for-ml.html – Different methods for creating confidence intervals for ML models in Python.
https://www.reddit.com/r/MachineLearning/comments/tnowi9/d_how_do_you_defend_the_choice_of_ml_algorithm/ – Don’t uderestimate this twitter thread. It contains many interesting methods on how to choose a proper ML algorithm.
Computer Science & Science
https://www.crunchydata.com/blog/parquet-and-postgres-in-the-data-lake – What is the secret of Parquet format and what is the difference compared to PostgreSQL?
https://davidamos.dev/revisiting-rock-paper-scissors-in-python/ – The difference between beginner solution in Python and advanced solution very nicely described by the rock-paper-scissors game.
Graphs and Visualizations
https://www.bellingcat.com/news/2022/03/17/hospitals-bombed-and-apartments-destroyed-mapping-incidents-of-civilian-harm-in-ukraine/ – Interactive map showing attacks on civilian targets in Ukraine. (rcmd by reader)
https://twitter.com/waitbutwhy/status/1519955771533905920?s=20&t=k6DCH8BKgncVIA1eViqsQw – Simple graph which shows age od the oldest person Earth since 1955.
https://www.allendowney.com/blog/2022/05/02/how-gaussian-is-it/ – How to properly visualize distribution, why some distributions are non-gaussian and when to use CDF or KDE plot.
Business and Career
https://neilmitchell.blogspot.com/2022/05/working-on-build-systems-full-time-at.html – What is it like to move from a finance company to Meta? What are the differences in corporate culture or in team management?
https://scientistemily.substack.com/p/inclusive-data-science-hiring – I like the idea that some companies profoundly think about the hiring process and even have a methodology for that purpose. So in this article you may inspire yourself how to hire data scientist.
https://www.adalovelaceinstitute.org/report/regulating-ai-in-europe/ – Difficult and unpleasant reading, but Ada Lovelace Institute tackles and critizes the AI Act by EU mentioned in DSB #115 that will affect the whole data science in EU and maybe even beyond.
https://towardsdatascience.com/a-physicists-view-of-machine-learning-the-thermodynamics-of-machine-learning-6a3ab00e46f1 – Very interesting article that offers a look at ML via eyes of thermodynamics.
https://howisfelix.today – Howisfelix is a project of Felix Krause. He decided to collect various data about his life and make them publicly available via a compelling and visually attractive web. Of course, there is also a publicly available github repository.
https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/ – Dunning-Kruger effect does not exist if it is measured properly.
https://machinelearningmastery.com/calculus-for-machine-learning-7-day-mini-course/ – Mini course on calculus for ML that should take only 7 days.
https://lakens.github.io/statistical_inferences/index.html – Go to this link if you want to improve your statistical inference. The web contains information from blog, MOOCs Improving Your Statistical Inferences and Improving Your Statistical Questions, and scientific work of Daniël Lakens from Eindhoven University of Technology.
https://hazyresearch.stanford.edu/blog/2022-04-19-contrastive-1 – First of the three parts about constrastive learning recently used for training ML models.
https://erikbern.com/2022/04/05/sigma-driven-project-management-when-is-the-optimal-time-to-give-up.html – The effect
of uncertainty on completion of a (software) project. When is it reasonable to stop the project? Learn about σ-driven project management.
https://medium.com/qe-unit/airbnbs-microservices-architecture-journey-to-quality-engineering-d5a490e6ba4f – Combination of micro and macroservies architecture in Airbnb in order to achieve high quality in engineering.
https://twitter.com/sh_reya/status/1521903041003225088 – Twitter thread by Shreaya Shankar about MLOps principles for every ML platform.
Video & Podcast
https://youtube.com/playlist?list=PLAm5TIX-yz7LJKkE-hzEWiIJpAFPmB19A – Outlier 2022 conference focuses on data visualizations and now you may watch their videos.
Papers & Books
https://hdsr.mitpress.mit.edu/pub/oraonikr/release/1 – Impressive interactive article about visualizations created by an American educator Elizabeth Palmer Peabody. She lived in the 19th century btw.
https://github.com/rougier/scientific-visualization-book – PDF book about scientific visualizations in Python with Matplotlib. Inspirative.
https://r-graphics.org/ – Second edition of R Graphics Cookbook.
Behind the Fence
https://boards.greenhouse.io/hungryroot/jobs/4313168004 – Data Scientist at Hungryroot, NYC, USA.