Skip to content

DSB #134

Hi,

long time no see, but on this Sunday you get a DSB volume full of interesting reading. I would recommend an article from Pop that is trying to explain ML via laws of thermodynamics. Also do not skip the MLOps one about σ-driven project management.

And as always, enjoy your reading.

Analytical

https://explosion.ai/blog/bloom-embeddings – Are your vectors too big to fit? Try probabilistic data structures, authors call it a cheat, but it works. Very dense reading.

https://sebastianraschka.com/blog/2022/confidence-intervals-for-ml.html – Different methods for creating confidence intervals for ML models in Python.

https://www.reddit.com/r/MachineLearning/comments/tnowi9/d_how_do_you_defend_the_choice_of_ml_algorithm/ – Don’t uderestimate this twitter thread. It contains many interesting methods on how to choose a proper ML algorithm.

Computer Science & Science

https://www.crunchydata.com/blog/parquet-and-postgres-in-the-data-lake – What is the secret of Parquet format and what is the difference compared to PostgreSQL?

https://github.com/pyscript/pyscript – PyScript is one of the latest big things in the Python world. Now you can run Python code right in HTML. There is also an intro article with more information.

https://davidamos.dev/revisiting-rock-paper-scissors-in-python/ – The difference between beginner solution in Python and advanced solution very nicely described by the rock-paper-scissors game.

Graphs and Visualizations

https://www.bellingcat.com/news/2022/03/17/hospitals-bombed-and-apartments-destroyed-mapping-incidents-of-civilian-harm-in-ukraine/ – Interactive map showing attacks on civilian targets in Ukraine. (rcmd by reader)

https://twitter.com/waitbutwhy/status/1519955771533905920?s=20&t=k6DCH8BKgncVIA1eViqsQw – Simple graph which shows age od the oldest person Earth since 1955.

https://www.allendowney.com/blog/2022/05/02/how-gaussian-is-it/ – How to properly visualize distribution, why some distributions are non-gaussian and when to use CDF or KDE plot.

Business and Career      

https://neilmitchell.blogspot.com/2022/05/working-on-build-systems-full-time-at.html – What is it like to move from a finance company to Meta? What are the differences in corporate culture or in team management?

https://scientistemily.substack.com/p/inclusive-data-science-hiring – I like the idea that some companies profoundly think about the hiring process and even have a methodology for that purpose. So in this article you may inspire yourself how to hire data scientist.

https://www.adalovelaceinstitute.org/report/regulating-ai-in-europe/ – Difficult and unpleasant reading, but Ada Lovelace Institute tackles and critizes the AI Act by EU mentioned in DSB #115 that will affect the whole data science in EU and maybe even beyond.

Pop

https://towardsdatascience.com/a-physicists-view-of-machine-learning-the-thermodynamics-of-machine-learning-6a3ab00e46f1 – Very interesting article that offers a look at ML via eyes of thermodynamics.

https://howisfelix.today – Howisfelix is a project of Felix Krause. He decided to collect various data about his life and make them publicly available via a compelling and visually attractive web. Of course, there is also a publicly available github repository.

https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/Dunning-Kruger effect does not exist if it is measured properly.

Education

https://machinelearningmastery.com/calculus-for-machine-learning-7-day-mini-course/ – Mini course on calculus for ML that should take only 7 days.

https://lakens.github.io/statistical_inferences/index.html – Go to this link if you want to improve your statistical inference. The web contains information from blog, MOOCs Improving Your Statistical Inferences and Improving Your Statistical Questions, and scientific work of Daniël Lakens from Eindhoven University of Technology.

https://hazyresearch.stanford.edu/blog/2022-04-19-contrastive-1 – First of the three parts about constrastive learning recently used for training ML models.

MLOps

https://erikbern.com/2022/04/05/sigma-driven-project-management-when-is-the-optimal-time-to-give-up.html – The effect
of uncertainty on completion of a (software) project. When is it reasonable to stop the project? Learn about σ-driven project management.

https://medium.com/qe-unit/airbnbs-microservices-architecture-journey-to-quality-engineering-d5a490e6ba4f – Combination of micro and macroservies architecture in Airbnb in order to achieve high quality in engineering.

https://twitter.com/sh_reya/status/1521903041003225088 – Twitter thread by Shreaya Shankar about MLOps principles for every ML platform.

Video & Podcast

https://www.youtube.com/c/Coreyms – If you need to get better in Git, OOP or more, definitely try this YouTube channel mainly focused on Python or JavaScript created by Corey Schafer. (rcmd by reader)

https://www.youtube.com/c/twimlai – Vidcast by Sam Charrington about ML and AI. For some reason I prefer vidcast to podcast so if you are the same, this one is perfect for you.

https://youtube.com/playlist?list=PLAm5TIX-yz7LJKkE-hzEWiIJpAFPmB19AOutlier 2022 conference focuses on data visualizations and now you may watch their videos.

Papers & Books

https://hdsr.mitpress.mit.edu/pub/oraonikr/release/1 – Impressive interactive article about visualizations created by an American educator Elizabeth Palmer Peabody. She lived in the 19th century btw.

https://github.com/rougier/scientific-visualization-book – PDF book about scientific visualizations in Python with Matplotlib. Inspirative.

https://r-graphics.org/ – Second edition of R Graphics Cookbook.

Behind the Fence

https://boards.greenhouse.io/hungryroot/jobs/4313168004 – Data Scientist at Hungryroot, NYC, USA.

Joke

https://programmerhumor.io/wp-content/uploads/2022/05/programmerhumor-io-linux-memes-programming-memes-2006eccb70c22d8-758×1529.jpg

Be First to Comment

Leave a Reply