hopefully you have lived your weekend fully and you can finish (or start) the day with DSB! And what would I recommend in this volume? Go to Computer Science & Science and read the first two articles. The first one about queues is a must read for every manager and the second one is for everybody who writes more than 10 lines of code.
And as always, enjoy your reading.
https://datajenius.com/2022/03/13/a-deep-dive-into-nlp-tokenization-encoding-word-embeddings-sentence-embeddings-word2vec-bert/ – If you are into NLP then read this very long article about embedding. Multiple methods described on real data.
https://blog.mlcontests.com/p/winning-at-competitive-ml-in-2022 – Who won ML competitions on Kaggle, AIcrowd and others last year? Which language and packages did they use? (not R or Tensorflow…)
https://yoshuabengio.org/2022/03/05/generative-flow-networks/ – GFlowNets are the new hot topic in the data science world. Learn about them in this blog post by Yoshua Bengio. Tutorial can be found here.
Computer Science & Science
https://blog.danslimmon.com/2016/08/26/the-most-important-thing-to-understand-about-queues/ – Impressive short article about unintuitive behavior of queues. It’s not only about CPU, but it can also be applied on utilizations of teams. And it explains a lot. You really should not even go anywhere near the 100% utilization to be effective.
http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ – It reads a string and writes out a string and yet it’s extremely complex and difficult. An automated code formatter for Dart. Old but gold article about computer science, complexity and usage of algorithms.
https://horace.io/brrr_intro.html – Do you need to optimize your DL model? Then you need to understand what is happening behind the scenes.
Graphs and Visualizations
https://towardsdatascience.com/how-to-build-effective-and-useful-dashboards-711759534639 – Good dashboard should answer the business question with 2 or 3 main graphs and should reflect the feedback of users. Build good dashboards!
https://mikkeldengsoe.substack.com/p/data-salaries-2022 – What are data salaries in US and Europe? And how do they differ by seniority?
Business and Career
https://djpardis.medium.com/models-for-integrating-data-science-teams-within-organizations-7c5afa032ebd – Building a data science organization is not about fancy and empty proclamations, you need to change everything. The article presents several models on how to integrate data science (team) into the company.
https://www.coindesk.com/tech/2022/03/17/understanding-the-technology-behind-decentralized-exchanges/ – If DeFi is supposed to be the future, one should understand it. This is the third article in the series about DeFi and it is about technology behind decentralised exchanges.
https://www.fastcompany.com/90724383/most-innovative-companies-data-science-2022 – Which companies drive their industries forward thanks to data science? Look at this really interesting list. (rcmd by reader)
https://hai.stanford.edu/research/ai-index-2022 – In DSB #112 was mentioned an AI index report by Stanford University. After a year, the 2022 (fifth) edition was released. The most interesting chapters for us are probably recommendation systems and NLP.
https://www.editorandpublisher.com/stories/the-new-york-times-expanding-our-data-journalism-ambitions,221259 – The New York Times is planning to invest more into their already amazing data-driven journalism. Unfortunately most of the articles are behind the pay-wall, which is the reason why they are mentioned so sparely in DSB.
https://techcrunch.com/2022/03/16/netflix-tests-a-new-feature-that-will-raise-prices-for-account-sharing/ – Netflix will recognize whether you share your account and make you pay more.
https://realpython.com/python-class-constructor/ – Comprehensive tutorial to class constructors in Python.
https://realpython.com/python-hash-table/ – Building a hash table in Python with test-driven-development. You also learn how Python’s hash function works.
Data & Libraries
https://future.a16z.com/emerging-architectures-modern-data-infrastructure/ – Data architecture cannot stay the same for years. It is changing rapidly. Read about these changes and have a look at modern patterns.
https://www.datamesh-architecture.com/ – In DSB #124 centralized data lakes were buried in the ground because of data mesh architecture. The link will give you a very nice and understandable explanation from an engineering perspective.
https://nlathia.github.io/2022/03/Labelled-data.html – What types of labels are there and where to take them?
https://eugeneyan.com/writing/end-to-end-data-science/ – Forget multiple data science roles and become an end-to-end data scientist. Because then you can deliver value like those in Stitch Fix or Netflix.
https://medium.com/news-uk-technology/the-0-1-done-strategy-for-data-science-3c1737de14b3 – When is a data science project done? How to measure whether it is finished? The answer is projection completion matrix and 0/1/Done strategy.
Papers & Books
https://astroautomata.com/paper/rediscovering-gravity/ – Wow, GNN is able to discover orbital mechanics without knowing actual parameters! Paper is available here.
https://omdena.com/blog/computer-vision-projects-github/ – List of 7 papers in computer vision with link to github. (rcmd by reader)
https://www.kdnuggets.com/2022/03/best-data-science-books-beginners.html – For most of you probably well known intro books about data-science. But still a good list.
Behind the Fence
https://ebay.wd5.myworkdayjobs.com/en-US/apply/job/Applied-Researcher_R0051752 – Senior ML engineer in eBay, New York, USA.