Hi,
Monday is close, but the DSB is closer! Have a look at a review article about this year in ML, AI and Data in Pop. Or give several minutes to a document about China and its surveillance in Video & Podcast.
And as always, enjoy your reading.
Analytical
https://julialang.org/blog/2021/10/DEQ/ – Deep Equilibrium Models mixed with Neural Ordinary Differential Equations implemented in Julia.
https://www.fast.ai/2021/10/17/control-groups/ – I love these articles. Statistical testing (and following interpretation) is quite difficult discipline and this article shows one of the reasons on the paper about long Covid in Childrens and Adolescents.
https://www.kdnuggets.com/2021/10/automl-introduction-auto-sklearn-auto-pytorch.html – Build your first AutoML model and use it as a baseline.
Computer Science & Science
https://www.assemblyai.com/blog/how-to-train-large-deep-learning-models-as-a-startup/ – How to train large models without extreme costs.
https://www.quantamagazine.org/how-wavelets-allow-researchers-to-transform-and-understand-data-20211013/ – You all know Fourier transformation. Next level are wavelets.
https://vscode.dev/ – Online VS code. It runs smoothly just in your browser. No installation needed.
Graphs and Visualizations
https://aegeorge42.github.io/ – Visual tutorial on neural networks. And it’s a really good introduction.
https://blog.djnavarro.net/posts/2021-10-19_rtistry-posts/ – Widely shared article about generative art in R, respectively what you can use in order to create a beautiful picture.
https://www.analyticsvidhya.com/blog/2021/10/10-ideas-that-every-professional-should-avoid-for-data-visualization/ – Simple but effective tips on what to do and what to avoid in visualizations.
Business and Career
https://electrek.co/2021/10/14/tesla-officially-launches-insurance-using-real-time-driving-behavior-texas/ – Tesla offers an insurance product that will be based on your driving score.
https://www.crowdfundinsider.com/2021/10/181744-digital-banking-fintech-revolut-performs-updates-to-enhance-security-streamline-ux/ – Revolut did some updates like personal logo on invoices or some security related improvements.
https://www.theverge.com/2021/10/19/22735612/facebook-change-company-name-metaverse – FB wants to rebrand itself to cover the whole metaverse of different products that it offers. Similarly as did Google in the past (Alphabet).
Pop
https://venturebeat.com/2021/10/16/the-2021-machine-learning-ai-and-data-landscape/ – What happened in MAD (ML, AI, Data) in 2021? Are centralized data lakes a thing of the past? Should we all start building distributed data mesh? And what is reverse ETL? (rcmd by reader)
https://van-magazine.com/mag/jan-swafford-beethoven-x/ – We can hear it all the time. AI finished some painting or created Beethoven 10th symphony, AI is better than humans in art, chess etc. This short essay explains fittingly why it is ridiculous and why the mentioned 10th symphony is really bad.
https://blog.seznam.cz/2021/10/diky-neuronove-siti-jsme-zlepsili-vysledky-vyhledavani-a-detekujeme-clickbaitove-titulky/
– Seznam vydal Small-E-Czech (smolíček), neuronku, co zlepšuje vyhledávání a dává ji k dispozici i na GitHub. (rcmd by reader)
Education
https://farid.one/kaggle-solutions/ – List of almost all available Kaggle solutions and ideas shared by top performers in the past competitions.
https://github.com/acmi-lab/cmu-10721-philosophy-machine-intelligence/blob/main/schedule.md – Interesting materials for course Philosophical Foundations of Machine Intelligence from Carnegie Mellon University.
https://towardsdatascience.com/shap-explain-any-machine-learning-model-in-python-24207127cad7 – Probably most of you already know SHAP. So there is quite a long guide on how to use it properly.
Data & Libraries
https://github.com/kiv-air/Czert – Introducing CZERT which is Czech BERT. (rcmd by reader)
https://pandera.readthedocs.io/en/latest/index.html – Pandera performs data validation on pandas data structures at runtime. (rcmd by reader)
https://www.kdnuggets.com/2021/10/query-pandas-dataframes-sql.html – FugueSQL allowes you to query pandas data frames with SQL statements.
MLOps
https://testdriven.io/blog/docker-best-practices/ – List of best practices for Docker. And yes, it’s useful even (mainly) for advanced users. (rcmd by reader)
https://airbyte.io/blog/airflow-etl-pipelines – Use Airflow to schedule and monitor ELT pipelines, but not for the extract, load and transform steps.
https://towardsdatascience.com/7-considerations-before-pushing-machine-learning-models-to-production-efab64c4d433 – Some useful points to keep in mind when going to production with your models.
Video & Podcast
https://www.youtube.com/watch?v=7gSU_Xes3GQ – How does the surveillance system in China work? Who owns the data? And where does the data come from? Watch this 40 minutes long document.
Papers & Books
https://arxiv.org/pdf/2003.02320.pdf – Introduction to knowledge graphs.They are not dead! And still probably the best choice for your chatbot. (rcmd by reader)
Behind the Fence
https://boards.greenhouse.io/quorum/jobs/3464510 – Senior Software Engineer at Quorum, Washington, DC, USA.
Joke
https://i.redd.it/nxcvj66kig071.jpg – Optimize!

[…] – In DSB #124 centralized data lakes were buried in the ground because of data mesh architecture. The link will […]