Skip to content

DSB #119

Hi,

before we skate into the weekend let’s read the DSB! And this volume is full of ridiculously good articles. I would recommend the one from Computer Science & Science where you can in pure python learn how a Bitcoin really works. Or beautiful intro to Kafka from Graphs and Visualizations. Or a story about the creation of a data driven company from Business and Career. And many more. Just go through it all.

And as always, enjoy your reading.

Analytical

https://www.aidancooper.co.uk/scottish-people-are-more-inclined-to-skip-the-gym/ – Scottish people rather stay at home and watch soccer instead of torturing themselves in the gym. Or something like this can be found in this interesting analysis.

https://deepnote.com/@ashish-karhade/Apple-Music-Streaming-analysis-RZehtt6QT5q1nWDC6mwXrQ – EDA of music streaming data from Apple Music. Very nicely done!

https://towardsdatascience.com/causal-inference-example-elasticity-de4a3e2e621b – Price elasticity estimation done in scikit-learn.

Computer Science & Science

https://karpathy.github.io/2021/06/21/blockchain/ – Everybody talks about crypto or blockchain. Almost nobody really understands what it’s hidden under the hood. In the article you will learn to create, digitally sign, and broadcast a Bitcoin transaction in pure Python. And the code itself is unbelievably amazing. (rcmd by reader)

http://willwhitney.com/parallel-training-jax.html – Neural networks parallelization on one GPU with JAX in order to achieve higher effectivity.

https://realpython.com/python-counter/ – Learn to use a Python’s Counter from collections library in order to count objects.

Graphs and Visualizations

https://www.gentlydownthe.stream/ – This is hilarious. How Apache Kafka works in beautiful presentation or visualization. Whatever this is, it’s a lovely introduction. (rcmd by reader)

https://austingil.com/svg-favicons – Learn how to use and create SVG favicons.

https://ml.berkeley.edu/blog/posts/clip-art/CLIP model by OpenAI is behind many AI generated arts. So read what is the story of the CLIP and what are the new variations in this long article.

Business and Career     

https://erikbern.com/2021/07/07/the-data-team-a-short-story.html – Another perfect reading about building a data driven company, what are the necessities, that the modelling is most of the time the least important thing and much more. (rcmd by reader)

https://arstechnica.com/gadgets/2021/07/new-google-pay-debit-card-lets-you-actually-spend-the-money-people-send-you/
– In US, Google releases Google Pay Balance Card by VISA with the NFC tap-and-pay functionality. Google Pay also has a P2P payment without a need for a bank account. And Google Bank Account is supposed to launch sometime this year.

https://sifted.eu/articles/revolut-losses-results-2020/ – Revolut doubled its losses last year and might have troubles with obtaining the UK banking license.

Pop

https://www.aimyths.org/ – Slightly confusing Website that will mythbust your opinion about AI. (rcmd by reader)

https://www.oracle.com/news/announcement/oracle-and-deutsche-bank-2021-06-24/ – Deutsche Bank is migrating into the Oracle Exadata Cloud. (rcmd by reader)

https://theconversation.com/languages-dont-all-have-the-same-number-of-terms-for-colors-scientists-have-a-new-theory-why-84117 – Why is there a different number of terms for colors in languages?

Education

https://github.com/ankurchavda/SparkLearning – More than 60 points about the theory of Spark, could be a useful reference material.

https://deepnote.com/@bala-priya/Guide-to-Cross-Validation-and-Hyperparameter-Search-aXKLhfeNSu6MKgtckqRMnw – Clasical topic, comprehensive intro to cross-validation and search of hyperparameters.

https://github.com/microsoft/ML-For-Beginners – 12-week, 24-lesson curriculum about ML by Microsoft.

Data & Libraries

https://github.com/fabsig/GPBoost – GPBoost is a software library for combining tree-boosting with Gaussian process and mixed effects models written in C++. There is a Python version as well as an R version. (rcmd by reader)

https://facebookresearch.github.io/Kats/ – After Prophet (see DSB #70 or DSB #112) Facebook is releasing another time series library called Kats. (rcmd by reader)

MLOps

https://medium.com/paypal-tech/400-days-paypals-data-warehouse-migration-to-google-bigquery-8c3b845eb6c9
– How PayPal migrated from Teradata to Google Cloud. Step by step, not all at once and with help of automatization, of course. (rcmd by reader)

https://eval.ai/ – Evaluate your ML and AI algorithms on this “alternative Kaggle”. (rcmd by reader)

https://towardsdatascience.com/learn-you-some-kedro-be67d4fc0ce7 – What is Kedro? It’s a Python framework for creating reproducible, maintainable and modular data science code with help of concepts from software engineering.

Video & Podcast

https://www.youtube.com/watch?v=2blLi3T4EGwWorkshop on Autonomous Vehicles by Andrej Karpathy, director of AI in Tesla, that was held on June 20, 2021. (rcmd by reader)

Papers & Books

https://abseil.io/resources/swe-book – How does software engineering in Google work and look like? Find out in this book. (rcmd by reader)

https://arxiv.org/pdf/2011.14817.pdf – Paper about TailCor, a tail correlation typical for rare events.

https://paperswithcode.com/paper/revisiting-deep-learning-models-for-tabular – Deep Learning for Tabular Data. It seems there is still no universally superior solution and it’s a draw between Gradient Boosted Decision Trees and DL models.

Behind the Fence

https://www.pythonjobshq.com/jobs/66817620-senior-software-engineer-at-truveris – Software Engineer in Truveris, New York, USA.

Joke

https://i.redd.it/73plz3uas8a71.jpg

One Comment

Leave a Reply