the weekend is almost here, but DSB is here right now! And I would recommend everything, but useful can be reading about CDE in Papers & Books. Or an eyes opening article about the realistic state of IT in the private sector in Computer Science & Science.
As always, enjoy your reading.
https://towardsdatascience.com/class-imbalance-random-sampling-and-data-augmentation-with-imbalanced-learn-63f3a92ef04a – Class imbalance is such a common problem so it is useful to have a look on imbalaned-learning. Includes a code written in Python. (rcmd by reader)
https://ai.stanford.edu/blog/scalar-probing/ – How good is a sense of scale in pre-trained languages models and are they able to get information from the text alone? (rcmd by reader)
https://www.microprediction.com/blog/ – In DSB #70 we introduced you Python and R library
Prophet, created by Facebook for automatic time-series prediction. Meanwhile it has become very popular and also suffered a critique about its accuracy. Should you use it?
Computer Science & Science
https://veekaybee.github.io/2019/05/10/java8/ – This is a 2 years old report, but gold, about the realistic state of IT, not the fancy one you can see on twitter, facebook or web. (rcmd by reader)
https://www.kdnuggets.com/2021/03/15-common-mistakes-python.html – Common mistakes data scientists make in Python, for example Jupyter Notebooks, which “…are really good for educational purposes and to do some quick and dirty job, but it fails to act as a good IDE.”
https://suade.org/dev/12-requests-per-second-with-python/amp/ – Comparison of multiple python web frameworks. You will learn which frameworks there are, why the benchmarks usually presented are unrealistic and much more about the topic itself. (rcmd by reader)
Graphs and Visualizations
https://www.statista.com/statistics/881541/bitcoin-energy-consumption-transaction-comparison-visa/ – A simple graph shows that one single Bitcoin transaction is more energy intesive than one hundred thousand VISA transactions. (rcmd by reader)
https://medium.com/airbnb-engineering/visualizing-data-timeliness-at-airbnb-ee638fdf4710 – Data timeliness visualizations in Airbnb for monitoring purposes. Amazing data engineering.
https://gifrun.com/ – High quality GIFs creator from videos that you insert! 😀
Business and Career
– We mentioned Walmart fintechs ambition in DSB #108 and they seem to be growing. Walmarts wants a super app like WeChat. Ecosystem where you can do everything and anything.
https://jvns.ca/blog/things-your-manager-might-not-know/ – How to comunicate with your manager because she/he doesn’t know everything.
https://www.mx.com/ultimate-guides/fintech-data/ – Current status of (banking) data and fintechs in USA, small survey between customers and future of banking.
https://aiindex.stanford.edu/report/ – Report about AI development by Stanford – and for one private investment is growing. Whole report has more than 200 pages, but there are also brief
highlights. (rcmd by reader)
https://www.noemamag.com/a-view-of-the-future-of-our-data/ – What are the data coalitons? This is a very interesting article about how you possibly could control your personal data in the near future.
https://www.nytimes.com/2021/03/06/science/math-gresham-sarah-hart.html – Profile of mathematician Sarah Hart that explores intersections of music, literature and mathematics.
https://www.marktechpost.com/2021/03/03/introduction-to-reinforcement-learning/ – Nice and simple intro to reinforcement learning, even with python code. (rcmd by reader)
https://evidentlyai.com/blog/tutorial-1-model-analytics-in-production – Learn how to monitor deterioration of model performance on production properly.
https://www.kdnuggets.com/2021/03/speed-up-scikit-learn-model-training.html – Tricks how to speed up Scikit-Learn.
Data & Libraries
https://arrow.readthedocs.io/en/latest/ – This is what was needed a long time ago but at least now Python has a proper library for time and date manipulation. (rcmd by reader)
https://towardsdatascience.com/the-building-blocks-of-a-modern-data-platform-92e46061165 – Maybe only a beginner’s guide for data platforms but also a quite comprehensive list of tools and services.
https://www.kdnuggets.com/2021/03/dask-pandas-data.html – Dask provides an option to use the Pandas API! Be real fast!
Video & Podcast
https://www.sciencemag.org/news/2021/03/watch-winners-year-s-dance-your-phd-contest – The best videos from Dance Your Ph.D. contest… for real 😀
https://www.youtube.com/watch?v=CntBTQLCaFc – MLOps for TinyML.
Papers & Books
https://arxiv.org/abs/1908.11523 – Paper about conditional density estimation and what tools in R and Python are suitable. (rcmd by reader)
Behind the Fence
https://live-web-assets.s3.amazonaws.com/Company+Website/Principal+Data+Scientist+-+Powerlytics.pdf – Principal Data Scientist in Philadelphia, USA.
https://xkcd.com/2435/ – Geothmetic meandian!