Skip to content

DSB #133

Hi,

after a month’s pause, there is another volume of DSB. So sit tightly, enjoy Easter, peel an egg and immerse yourself into reading. I would definitely recommend the very first article about the history and future of NN in Analytical.

And as always, enjoy your reading. 

Analytical

https://karpathy.github.io/2022/03/14/lecun1989/ – History and future of deep neural nets by our favorite blogger and director of AI at Tesla Andrej Karpathy.

https://towardsdatascience.com/you-dont-need-neural-networks-to-do-continual-learning-2ed3bfe3dbfc – Continual learning is not only for neural networks, do it also with XGBoost, LightGBM or CatBoost.

https://forloopsandpiepkicks.wordpress.com/2022/01/09/finally-understanding-what-statistical-significance-and-p-values-mean-a-simple-example-with-r-code – Another brief but clear explanation of statistical significance and p-values.

Computer Science & Science

https://www.codingame.com/contests/spring-challenge-2022 – CodinGame spring challenge 2022 starts on April 21st.

https://blog.jetbrains.com/blog/2022/03/21/looking-at-python-through-the-eyes-of-a-neural-net/ – JetBrains issued in public beta full line code completion plugin for Python. Have a look under the hood and start with creation of vocabulary for the neural net. (rcmd by reader) 

https://cerfacs.fr/coop/fortran-vs-python – Why is Python so popular? Comparison with Fortran or other compiled languages. And the explanation is very interesting.

Graphs and Visualizations

https://mobile.twitter.com/jvitek94/status/1508691846712770560 – Jaká je věková distribuce hokejistů v extralize? Který tým dává největší šanci juniorům a který naopak sází na zkušené borce? (rcmd by reader)

https://spectrum.ieee.org/software-engineer-salary-2657117801 – Software engineering salaries in the USA in 5 charts. For instance the most in demand language seems to be Go.

https://openai.com/dall-e-2/ – In DSB #108 was mentioned DALL-E, an AI system that generates images thanks to GPT-3. Now the version 2 is even more realistic and with 4x greater resolution. Here you can see what is possible. 

Business and Career

https://twitter.com/punk6529/status/1509832349986562048 – Twitter thread about AI panel at conference hosted by the European Commission. Basically EU is the best in regulation and sucks in everything else.

https://techcrunch.com/2022/04/14/what-hostile-takeovers-are-and-why-theyre-usually-doomed/ – Since Elon Musk is toying with Twitter, TechCrunch prepared an article about hostile takeovers, what they are and why they are rarely successful.

https://twitter.com/AriDavidPaul/status/1514309975962750993 – There is no crypto like S&P 500 index, why is that? Read this twitter thread.

Pop

https://statmodeling.stat.columbia.edu/2022/03/28/is-open-ai-cooking-the-books-on-gpt-3/ – How much is Open AI’s GPT-3 affected by fine tuning and what is the share of human interventions? Do not skip discussion.

https://www.theverge.com/2022/3/31/23004326/facebook-news-feed-downranking-integrity-bug – FB downranking system was broken for 6 months, so misinformations views spiked by as much as 30%.

https://spectrum.ieee.org/andrew-ng-data-centric-ai – Interview with Mr Ng about current issues in AI and data centric approach.

Education

https://sparkbyexamples.com/pyspark-tutorial/ – Detailed tutorial and great intro into PySpark. (rcmd by reader)

https://ubc-dsci.github.io/reproducible-and-trustworthy-workflows-for-data-science/README.html – Notes for course on Reproducible and trustworthy workflows for data science. Theoretical framework for practical DS problems.

https://github.com/dair-ai/ML-Notebooks – Code examples for multiple ML tasks and applications. 

Datasets & Libraries

https://github.com/pygod-team/pygod – Graph anomaly detection with PyGOD.

https://www.kdnuggets.com/2022/04/complete-collection-data-repositories-part-1.html – Multiple data repositories on different topics like griculture, audio, biology, climate, computer vision, economics, education, energy, finance, and government.

MLOps

https://continual.ai/post/the-modern-data-stack-ecosystem-spring-2022-edition – How does modern data stack nowdays look like? Which tools are up to date?

https://benn.substack.com/p/the-end-of-big-data – Hype about big data is dead, so finally the industry is starting to use them usefully for ordinary challenges.

https://towardsdatascience.com/how-to-structure-a-data-science-project-for-readability-and-transparency-360c6716800
– Standardized your ds projects. Use tools like cookiecutter, Poetry, Makefile, Hydra, DVC, Git, black, flake8 and others.

Video & Podcast

https://youtu.be/KbB0FjPg0mw – Do you want to refresh your knowledge about probability theory? Then try this Harward course. Python implementation of cases is available here. Solution manual here. And you can, of course, buy the textbook which is highly valued. (rcmd by reader)

https://open.spotify.com/episode/5O0Qg3c9Sk5BqFoqzG1AlP – Data science YouTuber Tina Huang will tell you what she is doing to be a good data scientist. (rcmd by reader)

https://open.spotify.com/episode/1j0nQx8u2OTvwzhnmDV7NU – Discussion about innovation in banking, its specifics, foundations and recognition. (rcmd by reader)

Papers & Books

https://transformer-circuits.pub/2021/framework/index.html – Wow, impressive, long and very complex paper that is trying to reverse-engineer transformers which should make them more understandable.

https://journals.sagepub.com/doi/abs/10.1177/00222429211066972 – Use bots rather than humans if the news is bad for the customer. At least according to this study. (rcmd by reader)

https://www.tmwr.org/ – Free book for R users about modeling in the Tidyverse. (rcmd by reader)

Behind the Fence

https://electricitymap.org/jobs/lead-data-engineer/ – Lead data engineer at electricityMap, Copenhagen, Denmark.

Joke

https://i.redd.it/yc3eljm3byt81.jpg

Be First to Comment

Leave a Reply