Hi,
yeah, I know, it’s not Friday but Monday… but still you can enlighten yourself with DSB! And have a look for instance at articles about exponential age from Business and Career or on trends in infrastructure in MLOps.
And as always, enjoy your reading.
Analytical
https://syncedreview.com/2017/10/22/tree-boosting-with-xgboost-why-does-xgboost-win-every-machine-learning-competition/ – Old but gold, and you use XGBoost for everything anyway, so you can at least learn why. (rcmd by reader)
https://www.analyticsvidhya.com/blog/2021/10/leveraging-pytorch-to-speed-up-deep-learning-with-gpus – Use GPU in PyTorch and read why it helps you.
https://justindomke.wordpress.com/2021/09/28/the-human-regression-ensemble/ – Funny but definitaly not silly article that compares human made prediction vs regression methods on small datasets.
Computer Science & Science
https://findwork.dev/blog/advanced-usage-python-requests-timeouts-retries-hooks/ – Summary of features useful in requests when writing web scraping tools or using JSON API’s. (rcmd by reader)
https://github.com/mattiasgustavsson/dos-like – If you want to create a DOS-like game then use dos-like framework.
https://realpython.com/python310-new-features/ – New features in Python 3.10.
Graphs and Visualizations
https://www.seznamzpravy.cz/clanek/kam-se-presunuli-volici-babis-vyluxoval-sve-mozne-spojence-177494 – Pěkný sankey diagram, které ukazuje, jak Babiš vyluxoval své potenciální spojence. Ale i třeba kam zmizeli voliči Pirátů a další přesuny. (rcmd by reader)
https://www.irozhlas.cz/volby/preference-pirati-stan-spolu-zakrouzkuj-zenu_2110101630_jab – Dataři z iRozhlasu o volbách udělali jako vždy hezké grafy a připravili i detailní volební mapu.
https://vallandingham.me/seriesheat/#/ – Heatmap of ratings for each episode of any TV Series on IMDb.
Business and Career
https://www.wired.co.uk/article/exponential-age-azeem-azhar – Nowdays everything is exponential, we live in the exponential age and we can be the players or the spectators, adapt or perish. “Companies that couldn’t keep up would be undone at remarkable speed.“
https://janezhang.ca/posts/failing-to-freelance-in-dataviz/ – Story of a freelancer in data visualizations with no happy ending. Why it’s so difficult and what were her mistakes.
https://towardsdatascience.com/how-to-build-your-data-analytics-team-1276d6729ac4 – Building a data science team is never easy peasy, so focus on proper steps and tasks.
Pop
https://www.ondrejslama.cz/jak-influenceri-pomohli-zachranit-volby – Když se spojí markeťáci a influenceři, aby přilákali mladý k volbám.
https://www.euronews.com/next/2021/10/08/new-robots-patrolling-for-anti-social-behaviour-causing-unease-in-singapore-streets – Xavier is a robot and it patrols the streets in Singapore. It’s not a RoboCop yet, but we get there in no time.
https://www.theverge.com/2021/10/6/22712365/twitch-data-leak-breach-security-confirmation-comments – Twitch was hacked and suffered a data breach (now you can easily learn how much streamers earn), because of “error in a Twitch server configuration change“. And yes, Twitch is owned by Amazon.
Education
https://programminghistorian.org/en/lessons/clustering-with-scikit-learn-in-python – Impressive article how to do clustering with Scikit-Learn. Of course, it covers K-means but also the DBSCAN algorithm and many useful metrics and steps that you need to take into account.
https://www.analyticsvidhya.com/blog/2021/10/an-introduction-to-problem-solving-using-search-algorithms-for-beginners – Nice list of search algorithms with Python code.
https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra/blob/main/The-Art-of-Linear-Algebra.pdf – Have a look on visualizations of matrix multiplications and their application in matric factorizations.
Data & Libraries
https://dlg4nlp.github.io/index.html – Graph4NLP is a deep graph learning library for natural language processing. Sounds interesting. (rcmd by reader)
https://koaning.io/til/dnd-data/ – There are datasets with transcriptions of people playing Dungeons and Dragons, and you can build your models upon them or analyze them or anything! (rcmd by reader)
https://juba.github.io/robservable/ – robservable is an R package that can display a notebook or its parts as an htmlwidget. And it is simply amazing.
MLOps
https://mattturck.com/data2021/ – The trends in data infrastructure. What was the thing last year and what is currently on fire? Quite a long and comprehensive article.
https://www.analyticsvidhya.com/blog/2021/10/a-complete-guide-on-docker-for-beginners – Basic intro to Docker.
https://www.kdnuggets.com/2021/09/nine-tools-mastered-before-phd-machine-learning.html – 9 very useful tools that can be used in DS, like already mentioned Docker or Lucidchart, which is basically draw.io on steroids.
Video & Podcast
https://www.youtube.com/c/TechWorldwithNana – Tech World with Nana is one of the best YouTube channels about DevOps engineering. Especially (but not only) for beginners. Spend some time here, you won’t regret it. (rcmd by reader)
https://cds.nyu.edu/deep-learning/ – Deep Learning Course at Center for Data Science.
Papers & Books
https://arxiv.org/pdf/2110.01889.pdf – Overview of state-of-the-art deep learning methods for tabular data used in three categories: transformations, specialized architectures, and regularization models.
https://arxiv.org/pdf/2110.02932.pdf – This paper shows how ML looks in smaller companies or non-tech companies. How do you do ML with limited resources?
Behind the Fence
https://careers.ibm.com/job/13522698/entry-level-data-scientist-2022-remote – Junior Data Scientist at IBM in USA, remote.
Joke
https://i.redd.it/2gu709kha1s71.jpg – Same with data science projects…
[…] – In DSB #123 you could read about freelancers in data visualizations. Now this article is about being a freelancer in data […]