

We’re talking about using code to train models which wasn’t a thing until LLMs were able to generate code which was after they bought GitHub. I’m pretty sure in 2018 they weren’t looking at GitHub as source of training data. It was a way to get developers to use their tools. Everyone was using Github and MS wanted to market their products to them. First Azure, now Copilot.
Yes but they specifically said “training data” which implies their use in LLMs. I agree they wanted user data, same as with linkedin, but I doubt they were thinking about “training data” in 2018.