DSB #139

Hi,

happy Sunday and happy DSB time with this volume! I recommend reading the article about backpropagation from Education and have a look on WeightedSHAP from Datasets & Libraries.

And as always, enjoy your reading.

Analytical

https://huggingface.co/blog/fine-tune-whisper – Explanation of the Whisper model created by OpenAI and its application on Common Voice dataset.

https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor – Matrix multiplication is now faster thanks to AlphaTensor. But before you use it for every matrix multiplication, read this article that explains when it’s useful and when you should stick with good old regular matrix multiplication.

https://medium.com/innovationendeavors/the-biggest-bottleneck-for-large-language-model-startups-is-ux-ef4500e4e786
– Extremelly interesting view on importance of proper UX for LLMs (large language models) with many concrete examples. Still an extremely underestimated area.

Computer Science & Science

https://venturebeat.com/data-infrastructure/report-81-of-it-teams-directed-to-reduce-or-halt-cloud-spending-by-c-suite/
– Cloud is costy and more and more companies are refusing to increase their spending on it. 39 % decided to move or leave significant cloud consumption and high-performance workloads on premise. Of course the cloud will not perish and lot’s will change thanks to it as already a lot has changed.

https://code-as-policies.github.io/ – Using LMPs (language model generated programs) based on LLMs to write a robot policy code.

https://deepnote.com/blog/future-of-notebooks-cl9q8v33jd5z60an5piaaqfut – Notebooks have been here with us since 1988. Read this short article and learn about their history.

Graphs and Visualizations

https://waxy.org/2022/11/invasive-diffusion-how-one-unwilling-illustrator-found-herself-turned-into-an-ai-model/
– Catchy and important discussion about morality of Stable Diffusion. Not only its ability to imitate the work of (living) human artists, but without their work, it would not have ever existed in the first place. Definitely read the comments, too.

https://ft-interactive.github.io/visual-vocabulary/ – Easy and quick to use visual vocabulary. Choose a proper graph for your visualization.

https://www.theverge.com/2022/11/3/23438604/text-to-image-ai-openai-dall-e-api-public-beta-price – We mentioned DALL-E (text-to-image) in the previous DSB. And now OpenAI launches the DALL-E API.

Business and Career

https://www.kaggle.com/kaggle-survey-2022 – It’s been a year since we shared a Kaggle DL/ML survey in DSB #126. So it’s time for new results!

https://techcrunch.com/2022/11/04/twitter-porn-onlyfans-elon-musk/ – You probably have already noticed. Ellon Musk bought Twitter and he has started to clean the house rather abruptly to make it profitable, among other things. And maybe the easiest way to achieve this goal is (no surprise): porn.

https://www.cnbc.com/2022/10/13/apple-goldman-sachs-introduce-interest-bearing-savings-accounts.html – Apple is expanding its financial services and plans to use iPhones as POS (point of sales) terminals, allow “buy now, pay later”, and also offer the saving accounts.

Pop

https://www.zdnet.com/article/metas-ai-guru-lecun-most-of-todays-ai-approaches-will-never-lead-to-true-intelligence/
– Now even LeCun admits that we need AI to be able to reason if we want human-level AI. The article contains an interview with the master himself.

https://www.vice.com/en/article/m7g5yq/students-are-using-ai-to-write-their-papers-because-of-course-they-are – This is amazing! Students are using AI to write their papers. Hopefully the education systems around the world will not fight it and instead incorporate it.

https://www.bleepingcomputer.com/news/security/microsoft-sued-for-open-source-piracy-through-github-copilot/ – Programmer and lawyer Matthew Butterick has sued Microsoft, GitHub, and OpenAI, because of GitHub’s Copilot. One does not simply use someone else’s data, not even Microsoft.

Education

https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b – Even the most basic concepts like backpropagation are important and you should understand them, otherwise you don’t know what the neurons in your NN are doing, whether they are working or not. So read this article and be smarter next time.

https://docs.google.com/presentation/d/1khY_li29A5aUo_cEVRsvO8pcRn7Xp9Bi – Long and detailed presentation on interpretation of ML models by Hima Lakkaraju from Harvard. Do not be discouraged by the length, it is worth it.

https://mljar.com/blog/jupyter-notebook-presentation/ – Simple manual on how to create a presentation from Jupyter Notebook.

Datasets & Libraries

https://github.com/ykwon0407/WeightedSHAP – Probably every data scientist knows about SHAP, so if you use it, try the WeightedSHAP. According to the authors it can identify more interpretable features.

MLOps & MLReg

https://netflixtechblog.com/orchestrating-data-ml-workflows-at-scale-with-netflix-maestro-aaa2b41b800c – Netflix is introducing Maestro, data workflow orchestration platform. with multiple levels of execution abstractions. It is a general-purpose workflow orchestrator that provides a fully managed workflow-as-a-service (WAAS).

https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2022/09/ico-publishes-guidance-on-privacy-enhancing-technologies/ – The Information Commissioner’s Office (ICO) in UK has published draft guidance on privacy-enhancing technologies (PETs) already used (not only) by financial organisations. These methods help you to use and protect the data simultaneously. (rcmd by reader)

https://www.lupa.cz/clanky/tvurci-umele-inteligence-budou-muset-pred-soudem-dokazovat-ze-neskodili/ – Odpovědnost za AI ponese vždy tvůrce a na příkaz soudu bude muset zpřístupnit jakékoliv informace. Dostane ale šanci se vyvinit, pokud prokáže, že za nehodu nemůže.

Video & Podcast

https://youtu.be/cdiD-9MMpb0 – AI legend Andrej Karpathy was a guest in the Lex Fridman podcast. And as usual it’s longer than the new Avatar movie, but also probably more interesting.

Papers & Books

https://lauraruis.github.io/2022/09/29/comm.html – One of the aspects of human communication that current state-of-the-art LLMs fail to understand is the pragmatics of natural language.

Behind the Fence

https://careers.airbnb.com/positions/4668796/ – Staff Data Scientist – Search, Inference in Airbnb in USA.

Joke

dgw8lsslp6v91.jpg (828×842) (redd.it)

One Comment

DSB #141 – Data Science Bulletin

[…] – You are probably already using SHAP (or weighted SHAP, that was introduced in DSB #139), but maybe you don’t know how it works. So give it a try and learn what is under the hood of […]

January 29, 2023 Log in to Reply