happy Sunday and happy DSB time with this volume! I recommend reading the article about backpropagation from Education and have a look on WeightedSHAP from Datasets & Libraries.
And as always, enjoy your reading.
https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor – Matrix multiplication is now faster thanks to AlphaTensor. But before you use it for every matrix multiplication, read this article that explains when it’s useful and when you should stick with good old regular matrix multiplication.
– Extremelly interesting view on importance of proper UX for LLMs (large language models) with many concrete examples. Still an extremely underestimated area.
Computer Science & Science
– Cloud is costy and more and more companies are refusing to increase their spending on it. 39 % decided to move or leave significant cloud consumption and high-performance workloads on premise. Of course the cloud will not perish and lot’s will change thanks to it as already a lot has changed.
https://code-as-policies.github.io/ – Using LMPs (language model generated programs) based on LLMs to write a robot policy code.
https://deepnote.com/blog/future-of-notebooks-cl9q8v33jd5z60an5piaaqfut – Notebooks have been here with us since 1988. Read this short article and learn about their history.
Graphs and Visualizations
– Catchy and important discussion about morality of Stable Diffusion. Not only its ability to imitate the work of (living) human artists, but without their work, it would not have ever existed in the first place. Definitely read the comments, too.
https://ft-interactive.github.io/visual-vocabulary/ – Easy and quick to use visual vocabulary. Choose a proper graph for your visualization.
https://www.theverge.com/2022/11/3/23438604/text-to-image-ai-openai-dall-e-api-public-beta-price – We mentioned DALL-E (text-to-image) in the previous DSB. And now OpenAI launches the DALL-E API.
Business and Career
https://techcrunch.com/2022/11/04/twitter-porn-onlyfans-elon-musk/ – You probably have already noticed. Ellon Musk bought Twitter and he has started to clean the house rather abruptly to make it profitable, among other things. And maybe the easiest way to achieve this goal is (no surprise): porn.
https://www.cnbc.com/2022/10/13/apple-goldman-sachs-introduce-interest-bearing-savings-accounts.html – Apple is expanding its financial services and plans to use iPhones as POS (point of sales) terminals, allow “buy now, pay later”, and also offer the saving accounts.
– Now even LeCun admits that we need AI to be able to reason if we want human-level AI. The article contains an interview with the master himself.
https://www.vice.com/en/article/m7g5yq/students-are-using-ai-to-write-their-papers-because-of-course-they-are – This is amazing! Students are using AI to write their papers. Hopefully the education systems around the world will not fight it and instead incorporate it.
https://www.bleepingcomputer.com/news/security/microsoft-sued-for-open-source-piracy-through-github-copilot/ – Programmer and lawyer Matthew Butterick has sued Microsoft, GitHub, and OpenAI, because of GitHub’s Copilot. One does not simply use someone else’s data, not even Microsoft.
https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b – Even the most basic concepts like backpropagation are important and you should understand them, otherwise you don’t know what the neurons in your NN are doing, whether they are working or not. So read this article and be smarter next time.
https://docs.google.com/presentation/d/1khY_li29A5aUo_cEVRsvO8pcRn7Xp9Bi – Long and detailed presentation on interpretation of ML models by Hima Lakkaraju from Harvard. Do not be discouraged by the length, it is worth it.
https://mljar.com/blog/jupyter-notebook-presentation/ – Simple manual on how to create a presentation from Jupyter Notebook.
Datasets & Libraries
https://github.com/ykwon0407/WeightedSHAP – Probably every data scientist knows about SHAP, so if you use it, try the WeightedSHAP. According to the authors it can identify more interpretable features.
MLOps & MLReg
https://netflixtechblog.com/orchestrating-data-ml-workflows-at-scale-with-netflix-maestro-aaa2b41b800c – Netflix is introducing Maestro, data workflow orchestration platform. with multiple levels of execution abstractions. It is a general-purpose workflow orchestrator that provides a fully managed workflow-as-a-service (WAAS).
https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2022/09/ico-publishes-guidance-on-privacy-enhancing-technologies/ – The Information Commissioner’s Office (ICO) in UK has published draft guidance on privacy-enhancing technologies (PETs) already used (not only) by financial organisations. These methods help you to use and protect the data simultaneously. (rcmd by reader)
https://www.lupa.cz/clanky/tvurci-umele-inteligence-budou-muset-pred-soudem-dokazovat-ze-neskodili/ – Odpovědnost za AI ponese vždy tvůrce a na příkaz soudu bude muset zpřístupnit jakékoliv informace. Dostane ale šanci se vyvinit, pokud prokáže, že za nehodu nemůže.
Video & Podcast
Papers & Books
Behind the Fence
https://careers.airbnb.com/positions/4668796/ – Staff Data Scientist – Search, Inference in Airbnb in USA.