DSB #140

Hi,

hopefully you’re enjoying the weekend and you will enjoy the DSB as well! As a skeptic I would recommend an article from Pop why Metaverse, Web3, and blockchain have failed. And a library cuML by NVIDIA seems impressive, so have a look at Datasets & Libraries.

And as always, enjoy your reading.

Analytical

https://www.lesswrong.com/posts/mkbGjzxD8d8XqKHzA/the-singular-value-decompositions-of-transformer-weight – SVD of transformer weight matrices and their interpretation. For those who want to refresh their knowledge about SVD, I would recommend this video series by the amazing Steve Brunton.

https://eugeneyan.com/writing/text-to-image/ – Text-to-image exploded in the last few months. This article is going through some papers about diffusion, text conditioning, classifier guidance and latent spaces.

https://blog.ml.cmu.edu/2022/11/28/causal-confounds-in-sequential-decision-making/ – Scalable algorithms for handling counfounding in sequential decisioning using techniques from causal inference.

Computer Science & Science

https://openai.com/blog/chatgpt/ – OpenAI is still leading in the LLM field. Now they’re coming with ChatGPT, which is performing in public tests much better than Galactica. The InstructGPT underneath looks like a promising step in research. You can try it here.

https://www.allthingsdistributed.com/2022/11/amazon-1998-distributed-computing-manifesto.html – Amazon published
their distributed computing manifesto, which looks like a cornerstone of AWS and the software design that is coming with it.

https://about.gitlab.com/handbook/business-technology/data-team/platform/sql-style-guide/ – Gitlab has its own SQL Style Guide and enforces it with SQLFluff linter.

Graphs and Visualizations

https://medium.com/@catmus2048/not-only-is-stable-diffusion-2-0-not-bad-but-really-better-my-prompt-engineering-experiments-459fbc5cec2 – Stability.ai released the Stable Diffusion 2.0 and this article is full of generated pictures, which could be subjectively considered beautiful.

https://dsego.github.io/demystifying-fourier/ – Impressive interactive visualization of Fourier analysis. Build your intuition.

https://netflixtechblog.com/for-your-eyes-only-improving-netflix-video-quality-with-neural-networks-5b8d032da09c
– Netflifx is improving video with NN-based video downscaling.

Business and Career

https://www.finextra.com/newsarticle/41190/money2020-us-wells-fargo-and-google-launch-ai-assistant-fargo – American bank Wells Fargo noticed it’s falling behind and after a year of development is coming with its own virtual assistant. Will it be more successful than traditional voice assistants like Alexa which seems to be a failure?

https://ryxcommar.com/2022/11/27/goodbye-data-science/ – Story of a data scientist who became a data engineer. “Nobody knew or even cared what the difference was between good and bad data science work. Meaning you could absolutely suck at your job or be incredible at it and you’d get nearly the same regards in either case.“

https://about.gitlab.com/handbook/business-technology/data-team/ – How are data teams in GitLab organized and how they measure impact?

Pop

https://americanaffairsjournal.org/2022/11/web3-the-metaverse-and-the-lack-of-useful-innovation/ – Detailed and long article describing current technological buble consisting of Metaverse, Web3, and blockchain.

https://galactica.org/ – Meta published its own attempt in the field of Large Language Models, but after strong backlash pulled Galactica out. Great post about what Galactica did wrong is here written by Gary Marcus.

https://fchollet.substack.com/p/ai-is-cognitive-automation-not-cognitive – AI is not going to replace the human, at least not the current AI. François Chollet explains why.

Education

https://www.quantamagazine.org/ai-reveals-new-possibilities-in-matrix-multiplication-20221123/ – Another article on breaktrough in matrix multiplication thanks to AI. And since it’s published on Quanta, it is a must read.

https://thetenequations.readthedocs.io/en/latest/lesson1/Introduction.html – FIFA World Cup is halfway through, so you still have time to learn how to create a prediction model in Python. Not only empirical, but also a theoretical article.

https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md – Feature engineering is probably the most important part of modelling, yet not comprehensibly covered in literature. So for beginners this short guide could be very useful.

Datasets & Libraries

https://github.com/rapidsai/cuml – cuML is an extremely fast library to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. Random forest with 300 trees, max depth 10, trained on data with 2.5 million rows and 54 columns in 15 seconds. (rcmd by reader)

https://github.com/Synerise/cleora – Cleora is an interesting project for learning embeddings for (not only) categorical data. There was a session by the Cleora makers on PyData Global.

MLOps & MLReg

https://fennel.ai/blog/challenges-of-building-realtime-ml-pipelines/ – How to handle realtime ML?

https://medium.com/at-the-front-line/ml-observability-hype-or-here-to-stay-acef064ff843 – ML Observability (not monitoring!) is a set of processes and tools required to maintain a healthy model in production. And it’s becoming a more and more important part of MLOps.

https://medium.com/@zongheng_yang/skypilot-ml-and-data-science-on-any-cloud-with-massive-cost-savings-244189cc7c0f
– SkyPilot is platform that helps you optimize (costs of) cloud computing.

Video & Podcast

https://youtu.be/tVNoetVLuQg – Evolution project of fight between predators and prey. (rcmd by Petr Petras)

https://lexfridman.com/guido-van-rossum-2/ – Amazing episode with BDFL himself, author of Python Guido van Rossum was invited to the Lex Fridman Podcast and it cannot be better. They’re discussing multiple topics not only about Python, but also about the future of programming and more.

https://youtu.be/8Ab3ArE8W3s – Great engaging talk pointing to the flaws in the current development environment. And providing interesting examples of what writing code could look like.

Papers & Books

https://jabde.com/2021/05/23/girlfriends-mood-time-series-analysis/ – Author is analyzing behaviour of his girlfriend. Can one generalize?

https://arxiv.org/ftp/arxiv/papers/2205/2205.02302.pdf – Paper overviewing MLOps.

Behind the Fence

https://www.epicgames.com/site/en-US/careers/jobs/4722707004 – Senior Data Scientist in Epic Games, New York City, USA.

Joke

https://sebastiancarlos.medium.com/how-i-quit-my-programmer-job-to-become-a-chicken-b733c20680b1