• earthworm@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    15
    ·
    4 hours ago

    Besides selling the most sought-after hardware, NVIDIA is also developing its own models, including NeMo Megatron models. These were trained using NVIDIA’s own hardware and with help from large text libraries, much like other tech giants do.

    As the case progressed, the authors also brought up NVIDIA’s contacts with Anna’s Archive, inquiring about “high-speed access” to the shadow library’s massive collection of pirated books.

    This is probably why Anna’s Archive hasn’t been taken down yet - the big fish are pirating, too.

    • starweasel [it/its, comrade/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 hours ago

      scripts that NVIDIA distributed to clients so they could automatically download and preprocess The Pile dataset.

      sounds like they allegedly wrote some stuff to get faster downloads/avoid throttling while they were allegedly pirating books from shadow libraries for their AI

    • Chahk@beehaw.org
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 hours ago

      In addition, the motion also targets the contributory copyright infringement allegations, which center on scripts and tools NVIDIA allegedly distributed so corporate customers could automatically download ‘The Pile,’ the dataset that contains Books3.