Skip to content

DSB #123

Hi,

yeah, I know, it’s not Friday but Monday… but still you can enlighten yourself with DSB! And have a look for instance at articles about exponential age from Business and Career or on trends in infrastructure in MLOps.

And as always, enjoy your reading.

Analytical

https://syncedreview.com/2017/10/22/tree-boosting-with-xgboost-why-does-xgboost-win-every-machine-learning-competition/ – Old but gold, and you use XGBoost for everything anyway, so you can at least learn why. (rcmd by reader)

https://www.analyticsvidhya.com/blog/2021/10/leveraging-pytorch-to-speed-up-deep-learning-with-gpus – Use GPU in PyTorch and read why it helps you.

https://justindomke.wordpress.com/2021/09/28/the-human-regression-ensemble/ – Funny but definitaly not silly article that compares human made prediction vs regression methods on small datasets.

Computer Science & Science

https://findwork.dev/blog/advanced-usage-python-requests-timeouts-retries-hooks/ – Summary of features useful in requests when writing web scraping tools or using JSON API’s. (rcmd by reader)

https://github.com/mattiasgustavsson/dos-like – If you want to create a DOS-like game then use dos-like framework.

https://realpython.com/python310-new-features/ – New features in Python 3.10.

Graphs and Visualizations

https://www.seznamzpravy.cz/clanek/kam-se-presunuli-volici-babis-vyluxoval-sve-mozne-spojence-177494 – Pěkný sankey diagram, které ukazuje, jak Babiš vyluxoval své potenciální spojence. Ale i třeba kam zmizeli voliči Pirátů a další přesuny. (rcmd by reader)

https://www.irozhlas.cz/volby/preference-pirati-stan-spolu-zakrouzkuj-zenu_2110101630_jab – Dataři z iRozhlasu o volbách udělali jako vždy hezké grafy a připravili i detailní volební mapu.

https://vallandingham.me/seriesheat/#/ –  Heatmap of ratings for each episode of any TV Series on IMDb.

Business and Career     

https://www.wired.co.uk/article/exponential-age-azeem-azhar – Nowdays everything is exponential, we live in the exponential age and we can be the players or the spectators, adapt or perish. “Companies that couldn’t keep up would be undone at remarkable speed.

https://janezhang.ca/posts/failing-to-freelance-in-dataviz/ – Story of a freelancer in data visualizations with no happy ending. Why it’s so difficult and what were her mistakes.

https://towardsdatascience.com/how-to-build-your-data-analytics-team-1276d6729ac4 – Building a data science team is never easy peasy, so focus on proper steps and tasks.

Pop

https://www.ondrejslama.cz/jak-influenceri-pomohli-zachranit-volby – Když se spojí markeťáci a influenceři, aby přilákali mladý k volbám.

https://www.euronews.com/next/2021/10/08/new-robots-patrolling-for-anti-social-behaviour-causing-unease-in-singapore-streets – Xavier is a robot and it patrols the streets in Singapore. It’s not a RoboCop yet, but we get there in no time.

https://www.theverge.com/2021/10/6/22712365/twitch-data-leak-breach-security-confirmation-comments – Twitch was hacked and suffered a data breach (now you can easily learn how much streamers earn), because of “error in a Twitch server configuration change“. And yes, Twitch is owned by Amazon.

Education

https://programminghistorian.org/en/lessons/clustering-with-scikit-learn-in-python – Impressive article how to do clustering with Scikit-Learn. Of course, it covers K-means but also the DBSCAN algorithm and many useful metrics and steps that you need to take into account.

https://www.analyticsvidhya.com/blog/2021/10/an-introduction-to-problem-solving-using-search-algorithms-for-beginners – Nice list of search algorithms with Python code.

https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra/blob/main/The-Art-of-Linear-Algebra.pdf – Have a look on visualizations of matrix multiplications and their application in matric factorizations.

Data & Libraries

https://dlg4nlp.github.io/index.html – Graph4NLP is a deep graph learning library for natural language processing. Sounds interesting. (rcmd by reader)

https://koaning.io/til/dnd-data/ – There are datasets with transcriptions of people playing Dungeons and Dragons, and you can build your models upon them or analyze them or anything! (rcmd by reader)

https://juba.github.io/robservable/ – robservable is an R package that can display a notebook or its parts as an htmlwidget. And it is simply amazing.

MLOps

https://mattturck.com/data2021/ – The trends in data infrastructure. What was the thing last year and what is currently on fire? Quite a long and comprehensive article.

https://www.analyticsvidhya.com/blog/2021/10/a-complete-guide-on-docker-for-beginners – Basic intro to Docker.

https://www.kdnuggets.com/2021/09/nine-tools-mastered-before-phd-machine-learning.html – 9 very useful tools that can be used in DS, like already mentioned Docker or Lucidchart, which is basically draw.io on steroids.

Video & Podcast

https://www.youtube.com/c/TechWorldwithNana – Tech World with Nana is one of the best YouTube channels about DevOps engineering. Especially (but not only) for beginners. Spend some time here, you won’t regret it. (rcmd by reader)

https://cds.nyu.edu/deep-learning/ – Deep Learning Course at Center for Data Science.

Papers & Books

https://arxiv.org/pdf/2110.01889.pdf – Overview of state-of-the-art deep learning methods for tabular data used in three categories: transformations, specialized architectures, and regularization models.

https://arxiv.org/pdf/2110.02932.pdf – This paper shows how ML looks in smaller companies or non-tech companies. How do you do ML with limited resources?

Behind the Fence

https://careers.ibm.com/job/13522698/entry-level-data-scientist-2022-remote – Junior Data Scientist at IBM in USA, remote.

Joke

https://i.redd.it/2gu709kha1s71.jpg – Same with data science projects…

One Comment

Leave a Reply