DSB #108


the week is over, let’s jump into the weekend, but first read the DSB! The most impressive for me were the pieces of amazing codes in Computer Science & Science. Or categorical variables encoding in Papers & Books.

As always, enjoy your reading.

Analytical – Jupyter Notebooks running inside Excel, like in real. You can call functions written in Python in Excel! (rcmd by reader) – Forget batch recommendations, real thing is in real time! – JupyterLab 3.0 is here, read about new features! (rcmd by reader)

Computer Science & Science – Impressive pieces of codes like Apollo 11 or Quake III Arena and why they were so important. – Advent of Code in Clojure? No problem, for Rober Martin, author of Clean Code. (rcmd by reader) – Translate between programming languages with help of deep learning.  (rcmd by reader)

Graphs and Visualizations – I am into graphs databases, I liked them and hence I like this graph data science platform. Unfortunately, it’s in Docker, so it’s a no go zone. But you can always try the cloud version. (rcmd by reader) – Analysis of 10 000 000 Jupyter Notebooks from Github, mainly nice graphs and their interpretation. (rcmd by reader) – Visualization of cpython repository and its evolution in time, beautiful! (rcmd by reader)

Business and Career – Welcome the Google Plex, it manages your day-to-day finances and banks should definitely notice that. (rcmd by reader) – Some of you are maybe milionairs thanks to bitcoin, but it’s useless as currency. – Everybody is into finance these days, even Walmart has a fintech start-up. And it’s an interesting symbiosis between retail shop and retail finances.

Pop – Without fans, there is no home advantage in soccer. (rcmd by reader) – This topic was tackled even by mainstream media. AI is able to be creative thanks to GPT-3 and come up with its own design of, for example, avocado chairs. (rcmd by reader) – Pěkný, až detektivně napsaný článek, jak inženýrům z Googlu trvalo 47 minut vyřešit výpadek. Za mě respekt. (rcmd by reader)

Education – Boost your productivity in Jupyter Lab with these extensions. (rcmd by reader) – Intro to probabilistic ML – a book by MIT. – If you need a GUI created by Python, think about Tkinter.

Data & Libraries – dirty_cat is doing exactly what you would expect – it provides encoders that are robust! (rcmd by reader) – DeBERTa by Microsoft is state of the art in NLU (natural language understanding), it was built with 1.5 billion parameters (code here) and uses a two-vector approach. (rcmd by reader) – How to validate data with Cerberus with help of regex for example.

Video & Podcast – Encode your categorical variables – see the first article in Paper & Books. (rcmd by reader) – Tomáš Mikolov o NLP, práci v Googlu a FB, aktuálním dění v AI atd. (rcmd by reader) – SuperDataScience about new trends in data science in 2021. (rcmd by reader)

Papers & Books – How to encode categorical variables when you have many of them, too many to be handled by one-hot encoding. (rcmd by reader) – Sentiment in Chinese media and its effect on the stock market. (rcmd by reader) – How e-commerce is doign in the pandemic. (rcmd by reader)

Behind the Fence – Lead Data Engineer in Kettle, California, USA.

Joke


