We will use Grok 3.5 (maybe we should call it 4), which has advanced reasoning, to rewrite the entire corpus of human knowledge, adding missing information and deleting errors.

Then retrain on that.

Far too much garbage in any foundation model trained on uncorrected data.

Source.

More Context

Source.

Source.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    23
    ·
    edit-2
    1 hour ago

    I elaborated below, but basically Musk has no idea WTF he’s talking about.

    If I had his “f you” money, I’d at least try a diffusion or bitnet model (and open the weights for others to improve on), and probably 100 other papers I consider low hanging fruit, before this absolutely dumb boomer take.

    He’s such an idiot know it all. It’s so painful whenever he ventures into a field you sorta know.

    But he might just be shouting nonsense on Twitter while X employees actually do something different. Because if they take his orders verbatim they’re going to get crap models, even with all the stupid brute force they have.

  • antihumanitarian@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    22 minutes ago

    Most if not all leading models use synthetic data extensively to do exactly this. However, the synthetic data needs to be well defined and essentially programmed by the data scientists. If you don’t define the data very carefully, ideally math or programs you can verify as correct automatically, it’s worse than useless. The scope is usually very narrow, no hitchhikers guide to the galaxy rewrite.

    But in any case he’s probably just parroting whatever his engineers pitched him to look smart and in charge.

  • Deflated0ne@lemmy.world
    link
    fedilink
    English
    arrow-up
    28
    ·
    2 hours ago

    Dude is gonna spend Manhattan Project level money making another stupid fucking shitbot. Trained on regurgitated AI Slop.

    Glorious.

  • NigelFrobisher@aussie.zone
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 hour ago

    I figure the whole point of this stuff is to trick people into replacing their own thoughts with these models, and effectively replace consensus reality with nonsense. Meanwhile, the oligarchy will utilise mass data collection via Palantir and ML to power the police state.

  • Sixty@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    30 minutes ago

    I think most AI corp tech bros do want to control information, they just aren’t high enough on Ket to say it out loud.

  • hansolo@lemmy.today
    link
    fedilink
    English
    arrow-up
    1
    ·
    49 minutes ago

    Prepare for Grokipedia to only have one article about white genocide, then every other article links to “Did you mean White Genocide?”

  • hector@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    9
    ·
    3 hours ago

    This is it I’m adding ‘Musk’ to my block list I’m so tired of the pseudo intellectual bullshit with bad interpretation science fiction work

    • DoubleSpace@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      47 minutes ago

      I think deep down musk knows he’s fairly mediocre intelligence wise. I think the drugs allow him to temporarily forget that.

  • dalekcaan@lemm.ee
    link
    fedilink
    English
    arrow-up
    136
    ·
    5 hours ago

    adding missing information and deleting errors

    Which is to say, “I’m sick of Grok accurately portraying me as an evil dipshit, so I’m going to feed it a bunch of right-wing talking points and get rid of anything that hurts my feelings.”

  • Elgenzay@lemmy.ml
    link
    fedilink
    English
    arrow-up
    18
    ·
    4 hours ago

    Aren’t you not supposed to train LLMs on LLM-generated content?

    Also he should call it Grok 5; so powerful that it skips over 4. That would be very characteristic of him

    • hansolo@lemmy.today
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      26 minutes ago

      Musk probably heard about “synthetic data” training, which is where you use machine learning to create thousands of things that are typical-enough to be good training data. Microsoft uses it to take documents users upload to Office365, train the ML model, and then use that ML output to train an LLM so they can technically say “no, your data wasn’t used to train an LLM.” Because it trained the thing that trained the LLM.

      However, you can’t do that with LLM output and stuff like… History. WTF evidence and documents are the basis for the crap he wants to add? The hallucinations will just compound because who’s going to cross-check this other than Grok anyway?

    • Voroxpete@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      17
      ·
      edit-2
      2 hours ago

      There are, as I understand it, ways that you can train on AI generated material without inviting model collapse, but that’s more to do with distilling the output of a model. What Musk is describing is absolutely wholesale confabulation being fed back into the next generation of their model, which would be very bad. It’s also a total pipe dream. Getting an AI to rewrite something like the total training data set to your exact requirements, and verifying that it had done so satisfactorily would be an absolutely monumental undertaking. The compute time alone would be staggering and the human labour (to check the output) many times higher than that.

      But the whiny little piss baby is mad that his own AI keeps fact checking him, and his engineers have already explained that coding it to lie doesn’t really work because the training data tends to outweigh the initial prompt, so this is the best theory he can come up with for how he can “fix” his AI expressing reality’s well known liberal bias.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      1 hour ago

      There’s some nuance.

      Using LLMs to augment data, especially for fine tuning (not training the base model), is a sound method. The Deepseek paper using, for instance, generated reasoning traces is famous for it.

      Another is using LLMs to generate logprobs of text, and train not just on the text itself but on the *probability a frontier LLM sees in every ‘word.’ This is called distillation, though there’s some variation and complication. This is also great because it’s more power/time efficient. Look up Arcee models and their distillation training kit for more on this, and code to see how it works.

      There are some papers on “self play” that can indeed help LLMs.

      But yes, the “dumb” way, aka putting data into a text box and asking an LLM to correct it, is dumb and dumber, because:

      • You introduce some combination of sampling errors and repetition/overused word issues, depending on the sampling settings. There’s no way around this with old autoregressive LLMs.

      • You possibly pollute your dataset with “filler”

      • In Musk’s specific proposition, it doesn’t even fill knowledge gaps the old Grok has.

      In other words, Musk has no idea WTF he’s talking about. It’s the most boomer, AI Bro, not techy ChatGPT user thing he could propose.

      • MagicShel@lemmy.zip
        link
        fedilink
        English
        arrow-up
        31
        ·
        edit-2
        3 hours ago

        If we had direct control over how our tax dollars were spent, that would be different pretty fast. Might not be better, but different.

  • maxfield@pf.z.org
    link
    fedilink
    English
    arrow-up
    95
    arrow-down
    2
    ·
    6 hours ago

    The plan to “rewrite the entire corpus of human knowledge” with AI sounds impressive until you realize LLMs are just pattern-matching systems that remix existing text. They can’t create genuinely new knowledge or identify “missing information” that wasn’t already in their training data.

        • MajinBlayze@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          ·
          edit-2
          4 hours ago

          Try rereading the whole tweet, it’s not very long. It’s specifically saying that they plan to “correct” the dataset using Grok, then retrain with that dataset.

          It would be way too expensive to go through it by hand

    • zildjiandrummer1@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      3
      ·
      3 hours ago

      Generally, yes. However, there have been some incredible (borderline “magic”) emergent generalization capabilities that I don’t think anyone was expecting.

      Modern AI is more than just “pattern matching” at this point. Yes at the lowest levels, sure that’s what it’s doing, but then you could also say human brains are just pattern matching at that same low level.

      • queermunist she/her@lemmy.ml
        link
        fedilink
        English
        arrow-up
        5
        ·
        3 hours ago

        Nothing that has been demonstrated makes me think these chatbots should be allowed to rewrite human history what the fuck?!

  • Lumidaub@feddit.org
    link
    fedilink
    English
    arrow-up
    56
    ·
    edit-2
    7 hours ago

    adding missing information

    Did you mean: hallucinate on purpose?

    Wasn’t he going to lay off the ketamine for a while?

    Edit: … i hadnt seen the More Context and now i need a fucking beer or twnety fffffffffu-

    • Carmakazi@lemmy.world
      link
      fedilink
      English
      arrow-up
      26
      ·
      6 hours ago

      He means rewrite every narrative to his liking, like the benevolent god-sage he thinks he is.

    • BreadstickNinja@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 hours ago

      Yeah, let’s a technology already known for filling in gaps with invented nonsense and use that as our new training paradigm.