hopefully you have survived Monday and you are ready for something much better, like DSB! For every data scientist, pandas antipatterns from Analytical is a very short but also very useful article. So you can start here, and then move to a very interesting report about U.S. teens and their usage of technology and social media. But no matter what, any choice is great!
And as always, enjoy your reading.
https://www.aidancooper.co.uk/pandas-anti-patterns/ – Pandas antipatterns. Don’t mutate, don’t use for loops and use correct data types. Short article but useful.
https://eng.lyft.com/causal-forecasting-at-lyft-part-1-14cca6ff3d6d – The first part of Lyft’s Causal Forecasting System. The first part describes this core data science problem in Lyft, next post should be more analytical (not released yet).
https://www.rpisoni.dev/posts/cossim-convolution/ – Sharpened Cosine Distance for better feature extractions implemented as a class in Python.
Computer Science & Science
https://towardsdatascience.com/r-shiny-is-coming-to-python-1653bbe231ac – The company Rstudio is becoming Posit and there is a promise of a more Python focused development. An example of this is the fresh news, unveiled at rstudio::conf(2022), that Shiny is coming to Python. (rcmd by reader)
https://eugeneyan.com/writing/uncommon-python/ – When to use super(), what is a mixin class, how to use relative imports, what to write in __init__.py and more and more interesting concepts for Python.
https://github.com/readme/featured/java-programming-language – Java is still going strong! Do not underestimate it! There is still high demand for this popular programming language and it has its claws in multiple solutions and companies.
Graphs and Visualizations
https://www.wired.com/story/striking-graphs-that-show-humanitys-domination-of-the-earth/ – Several graphs showing how humans affect the world. From global warming to production of concrete.
https://filwd.substack.com/p/theory-in-vis – Theory of data visualization. Is there any? Or should you build one by yourself?
https://medium.com/@tanyamarleytsui/shady-streets-6dad0979c13a – Amusing! When it’s hot and you want to cover yourself from the Sun on your way home, you just write an algorithm to find the most shady way!
Business and Career
https://www.pewresearch.org/internet/2022/08/10/teens-social-media-and-technology-2022/ – Impressive and very interesting report about how teens in the USA are using social media and technology. Lot’s of interesting numbers… e.g. 67 % of U.S. teens have used TikTok, but only 32 % have been on Facebook.
https://twitter.com/VoltarCH/status/1554075352359657474 – Astrophysicistist Oliver Müller describes his 9 months experience from Google and the differences between academic and private sphere.
https://www.vox.com/the-goods/2022/8/11/23298175/buy-now-pay-later-affirm-klarna-late-fees-cfpb – BNPL (buy now, pay later) is extremely popular service in fintech world. Is it really that good as it sounds? If you know which articles DSB likes, you also know the answer.
https://blog.peer-review.io/we-might-have-a-way-to-fix-scientific-publishing – Peer Review is a new platform for academic papers in alpha version. It should make scientific and academic publishing more accessible, democratic and open. Very ambitious project. Map of the project is available here.
https://www.wired.com/story/machine-learning-reproducibility-crisis/ – ML is popular not only in the business world, but in the scientific one as well. And it often produces misleading results because of poor knowledge or judgement of scientists. “You couldn’t just throw it all into a big machine-learning model and see what comes out.”
https://www.noemamag.com/deep-learning-alone-isnt-getting-us-to-human-like-ai/ – Symbol manipulation. Probably the most important goal in ML now in order to achieve general AI. Article written by Gary Marcus himself.
https://superfastpython.com/python-concurrency-choose-api/ – When should you use mutliprocessing module, threading module orasyncio module? Read the intro to these 3 concurrency APIs.
https://huyenchip.com//2022/08/03/stream-processing-for-data-scientists.html – Real-time ML is not an easy thing, therefore comprehensive intros like this one are more than great and helpful.
https://peterbloem.nl/blog/pca – Amazing series of articles about PCA, spectral theorem, eigenvectors and SVD.
Datasets & Libraries
https://github.com/mljar/mljar-supervised – MLJAR Automated Machine Learning for Humans is an AutoML Python package with focus on tabular data.
https://github.com/gradio-app/gradio – Gradio is a Python library to create a user interface for your ML models and apps. If you are allowed to create them.
https://www.montecarlodata.com/blog-data-quality-survey-2022/ – Data people spend 40 % of their time with evaulation or checking data quality.
https://www.reddit.com/r/datascience/comments/wla15x/nobody_talks_about_all_of_the_waiting_in_data/ – Data science is sometimes full of waiting. While your pipeline is running, go to this reddit threat and inspire yourself what to do meanwhile, or even better how to prevent that.
Video & Podcast
https://retronation.cz/wolfcast-62-cesta-k-umele-inteligenci-1/ – Vynikající podcast Michala Rybky, kde zabloudí ke kořenům AI, a to hluboko do starověku. V navazujícím druhém díle se pak dostane k moderní historii tak, jak ji mnozí už známe.
Papers & Books
https://dl.acm.org/doi/pdf/10.1145/3542921 – Paper that describes how practitioners develop mental models of AI behavior.
https://www.paperdigest.org/2022/08/kdd-2022-highlights/ – Conference on Knowledge Discovery and Data Mining released its paper digest of papers from this year.
Behind the Fence
https://careers.addisongroup.com/job/data-scientist-information-technology-oklahoma-city-ok-533458/873c8c50-f245-11ec-8e78-42010a8a0fd9 – Data Scientist position offered by Addison in Oklahoma, USA.
https://i.redd.it/pow1shl5zfh91.jpg (rcmd by reader)