it’s still Friday and it’s still time to start this great show called DSB! I would recommend an article about difficulties when you want have a data driven company. It’s a must read for everybody, you will find it in Business and Career.
As always, enjoy your reading.
https://nhigham.com/2020/07/07/what-is-stochastic-rounding/ – Learn about stochastic rounding. More examples can be found here or here. Simple code in R is here or the package for Python is here. (rcmd by reader)
https://towardsdatascience.com/how-to-work-with-object-detection-datasets-in-coco-format-9bf4fb5848a4 – If you want to try object detection then COCO dataset is for you, but how to work with it?
https://ruder.io/recent-advances-lm-fine-tuning/ – Fine-tuning methods for large pre-trained language models.
Computer Science & Science
https://syncedreview.com/2021/02/01/google-brain-introduces-symbolic-programming-pyglove-library-to-reformulate-automl/amp/ – AutoML based on symboling programming by Google and their python library PyGlove – which unfortunately currently is not public. (rcmd by reader)
https://nicolaiarocci.com/musings-on-pythons-pattern-matching/ – Python introduced patter matching as we told you in the last DSB. But not everybody is happy about that. (rcmd by reader)
https://segment.com/blog/goodbye-microservices/ – In bulletins #86 and #106 we mentioned that microservices have dark sides and it still applies. More about migration to microservices here and about interstage modular monolith here. (rcmd by reader)
Graphs and Visualizations
http://www.maartenlambrechts.com/2015/05/03/to-the-point-7-reasons-you-should-use-dot-graphs.html – Let’s see why the dots are important in visualizations and what is their role. (rcmd by reader)
https://archive.org/details/graphicmethodsfo00brinrich/page/342/mode/2up – Amazing, more than 100 years ago, there were statistical exhibits in New York in order to present statistical information. (rcmd by reader)
https://rpubs.com/bpbond/727258 – ggplot2 rules and there is an intermediate material on how to improve your visualizations in R.
Business and Career
https://hbr-org.cdn.ampproject.org/c/s/hbr.org/amp/2021/02/why-is-it-so-hard-to-become-a-data-driven-company – You
want to be data driven, that you really need change your culture or you will fail. You need to handle your “organizational alignment, business processes, change management, communication, people skill sets, and resistance or lack of understanding to enable change. Culture eats strategy for breakfast.” (rcmd by reader)
https://www.analyticsvidhya.com/blog/2021/02/comprehensive-guide-data-science-professional/ – We all have seen maybe too many articles trying to explain how to become a data scientist. This one is different. It’s comprehensive, explains not only path, but also contains some interesting tips. Even though it serves mainly as an ad to an online course.
https://boxmining.com/ethereum-2/ – Ethereum 2.0, what is it all about, how does it work and why should you care. Very technical article.
https://cloud.google.com/blog/products/ai-machine-learning/how-waze-predicts-carpools-using-google-cloud-ai-platform – High level description of Waze carpool recommendation system. In the last part they mentioned interesting services by Google as Explainable AI or AI Platform. (rcmd by reader)
https://www.reuters.com/article/amp/idUSKBN2AP1AC – Disputes between Google and its scientist about papers that were rejected. Google even try to change words in papers in order to make itself look better. (rcmd by reader)
https://www.jetbrains.com/lp/python-developers-survey-2020/ – Did you also contribute to the Python Developers Survey by JetBrains? Then look at the results, they are more than interesting. (rcmd by reader)
https://www.analyticsvidhya.com/blog/2021/02/a-simple-guide-to-metrics-for-calculating-string-similarity/ – Different
distances to compare similarity between strings.
https://www.analyticsvidhya.com/blog/2021/02/new-approach-for-regression-analysis-ransac-and-mlesac/ – Random Sample Consenus and Maximum Likelihood Estimator Sample Consensus for regression analysis.
https://mikkel.ca/blog/git-is-my-buddy-effective-solo-developer/ – There’ll never be too many articles on the topic how to git, so read this one, you won’t regret it. Contains some useful principles.
Data & Libraries
https://github.com/TheAlgorithms/Python – So many algorithms written in Python for learning purposes. The repository contains algorithms even for other languages. (rcmd by reader)
https://github.com/huggingface/knockknock – Library that gives you a notification when your model is trained. Simple and perfect.
https://github.com/nasa/fprime/tree/devel/Gds – NASA shares a repository called fprime and it’s about GDS (Ground Data System) that is essential for navigation of spacecraft – like Perseverance. And yes, they are using Flask! (rcmd by reader)
Video & Podcast
https://www.youtube.com/watch?v=hgI0p1zf31k&ab_channel=PythonDiscord – A song about PEP 8. Yes, you heard me right, it’s a song about PEP 8! (rcmd by reader)
https://open.spotify.com/episode/7FRfhVzQ6v6j3NnfCj9sjq?si=j5mzDPn2QUCW3JRNBqF9zg&nd=1 – Jak fungují digitální kanály v Monetě. (rcmd by reader)
https://open.spotify.com/episode/7o8go7sTHwmIRGdotsm4eI?si=kK__pKa-SpuAV5IRuvuqXw&nd=1 – O mediální komunikaci v KB a tady najdete pro změnu podcast přímo od KB tentokráte o jejich datech. (rcmd by reader)
Papers & Books
https://syncedreview.com/2021/02/04/aaai-2021-best-papers-announced/amp/ – Best papers from Conference on Artificial Intelligence, AAAI-21. (rcmd by reader)
Behind the Fence
https://careers.mozilla.org/position/gh/2573014/ – Data scientist at Mozilla, home office anywhere in the USA or Canada.