One of GitHub's most staple contributors announced they are abandoning ship due to constant outages. GitHub's COO responds, promising change, but is it all too little too late?
What? Microsoft bought GitHub in 2018. ChatGTP was released 4 years later. The AI boom wasn’t a thing when MS was buying Github and no one was thinking about using it for data back then. Cloud was big thing in 2018 and MS bought GitHub to integrate it with Azure and sell computing to people using github actions.
Yes but they specifically said “training data” which implies their use in LLMs. I agree they wanted user data, same as with linkedin, but I doubt they were thinking about “training data” in 2018.
Google Voice was also a service designed to gather training data for speech to text / text to speech services at Google. That’s why it was free. The advent of LLMs just gave it something else to plug the data into. The Microslopening of GitHub, at its core, had similar motivations. Having effectively full backend visibility of all content on the (at the time) centralized service that damn near everyone who publicized their code was using to publicize their code was a valuable business proposition even before they shoved it all in to a training set.
We’re talking about using code to train models which wasn’t a thing until LLMs were able to generate code which was after they bought GitHub. I’m pretty sure in 2018 they weren’t looking at GitHub as source of training data. It was a way to get developers to use their tools. Everyone was using Github and MS wanted to market their products to them. First Azure, now Copilot.
Microslop bought GitHub for the training data. That’s it. That was the whole point.
The funniest part is that their model is considered to be rather shit-tier.
What? Microsoft bought GitHub in 2018. ChatGTP was released 4 years later. The AI boom wasn’t a thing when MS was buying Github and no one was thinking about using it for data back then. Cloud was big thing in 2018 and MS bought GitHub to integrate it with Azure and sell computing to people using github actions.
LLMs are just one way to monetize the data. I would bet hand over fire that Microsoft used the data as soon as they bought GitHub.
Yes but they specifically said “training data” which implies their use in LLMs. I agree they wanted user data, same as with linkedin, but I doubt they were thinking about “training data” in 2018.
Google Voice was also a service designed to gather training data for speech to text / text to speech services at Google. That’s why it was free. The advent of LLMs just gave it something else to plug the data into. The Microslopening of GitHub, at its core, had similar motivations. Having effectively full backend visibility of all content on the (at the time) centralized service that damn near everyone who publicized their code was using to publicize their code was a valuable business proposition even before they shoved it all in to a training set.
We’re talking about using code to train models which wasn’t a thing until LLMs were able to generate code which was after they bought GitHub. I’m pretty sure in 2018 they weren’t looking at GitHub as source of training data. It was a way to get developers to use their tools. Everyone was using Github and MS wanted to market their products to them. First Azure, now Copilot.