guess who’s back? It’s a DSB! While the weekend is coming to an end, don’t worry, you’ll have plenty of time to read through our latest volume. You’ve probably noticed that over the years we have been organically changing our publishing
frequency from weekly to fortnightly, to tri-weekly, and now monthly. And the reason is simple: it takes more and more hours to complete one volume and we don’t want to make this a “GPT bulletin”. Although we still have lots of cool links about this amazing tool, we also want to read interesting articles about more “mundane” topics and finding those is our goal.
I would recommend the Duolingo growth model from Analytical or 12 points to check data science company’s maturity at Business and Career.
And as always, enjoy your reading.
https://blog.duolingo.com/growth-model-duolingo/ – Insight into Duolingo growth model. Which metrics do they choose? And how do they ensure ongoing growth?
https://www.quantamagazine.org/cryptographers-show-how-to-hide-invisible-backdoors-in-ai-20230302 – What is
the role of cryptography in understanding behavior of ML models?
Computer Science & Science
https://arstechnica.com/information-technology/2023/02/chatgpt-on-your-pc-meta-unveils-new-ai-model-that-can-run-on-a-single-gpu/ – Meta released smaller LLM called LLaMA-13B (LLAMA = LLM Meta AI) that should be better than GPT-3 and could run on PCs, smartphones and etc.
https://aiguide.substack.com/p/why-the-abstraction-and-reasoning – Even though chatGPT is no doubt impressive, there are still tasks which it cannot do. For example an ARC created by François Chollet, which tests the ability to form and understand abstract concepts. I also recommend his few days old twitter thread on the topic.
– OpenAI has released APIs for ChatGPT and Whisper. Guidelines for ChatGPT API are available here and for Whisper API here. And quite important for pricing is token definition.
Graphs and Visualizations
https://ai.googleblog.com/2023/02/a-vision-language-approach-for.html – Introducing Spotlight model by Google is a “smaller” solution to understand user interfaces (UI).
https://100.datavizproject.com/ – One hundred ways to visualize one dataset. Quite a good thing for inspiration! Yet some of the graphs are very weird at least, or even useless, but still worth a shot.
https://graph-tool.skewed.de/ – Do you need to create a pretty graph with many and many nodes? Try this graph-tool for manipulation and statistical analysis of networks.
Business and Career
https://aeturrell.github.io/markov-wanderer/posts/data-science-maturity/data-science-maturity.html – What’s your company’s maturity in data science? Check these 12 points. I agree with all of them.
– Supermarkets are selling the data about their customers to brands and advertisers.
https://www.synq.io/blog/europe-data-salary-benchmark-2023 – What is the salary benchmark for data analysts, data scientists, analytics engineers and data engineers in Europe?
https://www.niemanlab.org/2023/02/meet-the-first-ever-artificial-intelligence-editor-at-the-financial-times/ – Madhumita
Murgia becomes the very first AI editor in Financial Times.
https://www.pewresearch.org/internet/2023/02/24/the-future-of-human-agency/ – Are we more in control of our lives thanks to AI, or is it exactly the opposite?
https://stanfordblockchainreview.substack.com/p/nouns-dao-and-the-philosophy-of-governance – Completely new word for me is this case study showcasing the principles and mechanisms of self-governance of Nouns DAO. The members of this DAO are owners of the NFT collection, and as the acronym suggests, it stands for decentralized autonomous organization.
https://soatok.blog/2023/03/01/database-cryptography-fur-the-rest-of-us/ – Very comprehensive article about database cryptography.
https://dcai.csail.mit.edu/ – Introduction to Data-Centric AI (DCAI) by MIT Edu.
https://www.kdnuggets.com/2023/02/getting-started-python-generators.html – Nice intro to Python’s generators.
Datasets & Libraries
https://github.com/julkaar9/pynimate – Pynimate is a Python package for statistical data animations.
https://github.com/dstackai/dstack#readme – dstack is an open-source tool that streamlines the process of creating reproducible ML training pipelines that are independent of any specific vendor.
MLOps & MLReg
https://www.carted.com/blog/building-an-efficient-machine-learning-api/ – High-accuracy, low-latency product categorization endpoint created by ML team in Carted.
https://mindfulmodeler.substack.com/p/the-way-of-model-agnostic-machine – Let’s try model-agnostic modelign, don’t care about an algorithm, care only about performance.
https://motherduck.com/blog/big-data-is-dead/ – Big data was the topic in the past, is their time over? And why?
Video & Podcast
https://animatedai.github.io/ – Animations and instructional videos about neural networks.
https://player.fm/series/series-3446985/mikolov-opravdova-umela-inteligence-je-daleko-chatgpt-nepremysli-je-to-jen-statisticky-model – Rozhovor s Mikolovem, kde vysvětluje, proč AGI je ještě daleko a chatGPT není inteligentní. (rcmd by reader)
Papers & Books
https://sites.google.com/view/stablediffusion-with-brain/ – This paper about reconstructing visual experiences from human brain activity was also hyped, especially on social networks, but as Luise Steuckart mentioned, it doesn’t generally translate your thoughts into images, even though it would be amazing. (rcmd by reader)
https://arxiv.org/pdf/2302.00487.pdf – Continual learning for AI systems.