135 - The Once and Future Linked Open Data
Episode transcriptunknown
plain_text
6 months ago
110 kB
132
Indexable
Jonny: [0:00] I'm so glad that you're here with us, Dorothy, just because, like, I'm just, like, always interested in your perspective on this, having, like, lived in the library world of leaked data for so long, just being, like, because on the other end of, like, living in programmer world, sometimes I still get the sort of, like, both the persnickety, you know, purist side and the people that are trying to make it work happening. But like very few like actually this doesn't even come close to meeting my needs or like resemble my my work style at all i. Jay: [0:32] Remember it was so funny like scott carlson helped edit or write that like linked data in libraries book and then like two days later was like linked data's dead and then like became a like a programmer i love scott i think. Dorothea: [0:51] He was stuck in a deeply shitty workplace and i. Jay: [0:55] Agree it happens to us and then we get out of them hooray. Jonny: [1:03] Proud of you. Jay: [1:05] Yay okay, I'm Justin. Justin: [1:35] I'm a Skoll Column Librarian, my pronouns are he and they. Sadie: [1:38] I'm sadie i work it at a public library and my pronouns are they then. Jay: [1:43] I'm jay and i'm a no longer a music librarian, finally fucking a cataloging librarian again for the first time seven years after finishing She Graskell. And I won't say where. And my pronouns are he, him. Justin: [2:09] Just post the address this time around. Jay: [2:12] If you're in the Discord, you know. Justin: [2:13] Okay. And we have guests, would you like to introduce yourselves? Dorothea: [2:17] Sure, I'll start. I'm Dorothea Salo, pronouns she, her. And I teach at the University of Wisconsin-Madison Information School. Jonny: [2:26] I'm Johnny Saunders, they, them. I'm just sort of like, I guess I do various forms of like information based work at UCLA. Justin: [2:37] Thank you. Jonny: [2:38] Yeah. For the belated applause. I was waiting for that. Dorothea: [2:41] Thank you. Very kind. Justin: [2:42] Welcome. Welcome. I still have my reorganized on board, so I still only have like 10 sounds. Oh. No copyright law in the universe is going to stop me. Jay: [3:00] I i started making justin watch it's always sunny and it was a bad decision because now the soundboard has it's always sunny theme on it and. Jonny: [3:11] It's got to be the full length version too no soundboard is complete without the one that keeps going for an hour. Justin: [3:17] Because that's. Jay: [3:19] Just a piece of like public domain music it's not even like written for the show I'm pretty sure sweet I. Justin: [3:27] Think I had just the full Soviet Union anime, yeah I was like this is anime 1 you piece of shit, yeah that one keeps going, so this was an episode we came up with because city wanted us to explain linked open data and i think i probably know the second least so i figured it would be funnest for me to start and try and explain what linked open data is which is all from what i remember in grad school which is the last time i ever had to interact with it that i'm aware of besides like you know the parts of linked data that are used by google is it's primarily you can think about it as triples and everything is one item linked to another item so hamlet is a character in hamlet the book those are two separate uris and then It's a play. Well, it's in book form. Jay: [4:39] Okay. Justin: [4:43] And then Shakespeare is the author of Hamlet, and so there's an is the author of statement that each has a URI, and these three things can chain together forever, and that way you would have something that's both machine-readable and human-readable, and somehow that makes data boxes in Google work. Dorothea: [5:05] Or certain extremely non-human-readable forms of human-readable. Justin: [5:10] Right so once he's trying to organize it in other ways like say make a list of things suddenly it doesn't work anymore yeah because now you have to see a series of statements. Jonny: [5:23] Yep i'm like just chin. Sadie: [5:25] Hands here waiting for all of these super smart. Jay: [5:28] People literally, this is we explain linked data to sata yeah. Jonny: [5:37] There's like the tripled explanation and And then immediately you fall off the cliff of ideology and 25 years of some of the most prickly and opinionated people in the world making like claims on reality that you truly can't believe until you see them. So it's like, you know, we got talking about technology and beliefs. And then also like for a lot of people, like a huge amount of like wasted time, trauma or success, depending on if you work for Amazon or Google or not. Jay: [6:06] Yeah. Like my experience with linked data is that I took ontology development in grad school with Dave Dubin, shouts out Dave Dubin. And we developed, we learned RDF and we mainly wrote in turtle writing. I think but we learned like all the other like triples and in three and all that but i think he liked turtle the the best if i'm remembering the only. Jonny: [6:30] One that worked. Jay: [6:31] Yeah as a as a class we collectively created an ontology together each of us had our own specific section of it that we had to create and like mine's still it's like still on my github and everything like it's still like theoretically is a working rdf like ontology is. Jonny: [6:51] This the origin of the homosaurus. Jay: [6:53] Yes no but i'm also i'm also on the homosaurus which is actually linked data but i don't none of us on the board actually interact with that part so much like we have like a software dude who does that but like we all know about it to some degree and then i've also done some like wiki data like I did a Wikidata training. I went through one of those trainings one summer, and that was cool. And I submitted a proposal for a paper on thinking of Wikidata and linked data as a cyborg kind of thing, but interrogating that. And I submitted this to the Code for Live journal that ended up being the one that everyone yelled at. So I'm glad it got rejected. Like literally that issue was the one that I submitted to with like the data, like bad data practices. That one, that was the one I had submitted to. So I'm glad I got rejected now. Jonny: [8:01] Narrow miss, narrow miss. Dorothy, weren't you the one that blew the whistle on that? Or is that different? It's like a different time. Dorothea: [8:10] It was like you and Becky. Jay: [8:11] Right? Dorothea: [8:12] Well, I mean, if we blew the whistle over anything, it wasn't over linked data. It was over privacy. It's a thing. You might want to let people keep it. Jay: [8:25] Yeah. Dorothea: [8:26] Yeah. Jay: [8:27] Yeah, I just happened to be writing about linked data for the thing I was writing about. Right. Yeah, so I'm very glad that my goofy little high theory article got rejected. Justin: [8:43] So I actually never ended up using Turtle. I think I learned it in three notation. It was very not hands-on the way I learned about it. And so it was never clear how it worked except for the aspects that kind of pulled from Wikidata and that explained a little bit, but I never got an in-depth explainer for how Wikidata works. So it was very theoretical and my metadata teacher was very on the theoretical side of things, so I never got to see a lot of practical applications of a lot of the stuff we talked about in class. Jonny: [9:24] So that is not how. Dorothea: [9:26] I teach metadata. Jonny: [9:27] Yeah if. Dorothea: [9:29] You're not doing that one step away yeah. Jonny: [9:32] Exactly and that's like one of the major cultural fissures is that just like is it supposed to be something that you touch or is it something that is supposed to be like a true artifact of the world and needs to be done once and never touched again you know so like that that you the division between the teaching styles it's like reflective of the entire system of belief that goes into linked open data as well i'm like i'm curious like like hearing people's like origin stories with linkedin because like i'm like because because dorothea you've been doing this for like a while there's like in libraries and stuff like that i'm curious like if what your origins really are i. Dorothea: [10:15] Mean you know i got into it the same way a lot of people did as it started to be talked about out as potentially where libraries move from MARC. And, you know, that's a really awkward question when you think about it. Sticking with the homegrown, if you will, like MARC encoding, which we made up from scratch in the 1960s, Lord bless Henriette Avram, she was awesome. Right. But it doesn't map cleanly onto any of the dominant data structures, data models that we have today. It's pulling teeth to try to stuff mark into a relational database such that you can actually do anything with it. You can kind of do it in XML, but XML is really squishy that way. And I don't mean that in a bad way. XML squishiness is actually quite useful. If you look at, for example, EAD, encoded archival description, some of EAD is what you and I, Johnny, would probably think of as data. But a lot of EAD is narrative, right? It's storytelling. And you know what? Databases are shit at storytelling. You can't represent Hamlet in a database. Dorothea: [11:35] Linked data is shit in storytelling. One of the things that really pissed me off about the very early days of linked data was some of its boosters going around and just bragging on it as something where you could literally represent anything, right? If you could put it in a computer, you could put it in linked data. And my retort to that is, as it has always been, and this is pure coincidence, but I kind of love it. All right, express Hamlet in RDF and get back to me, okay? You can't do it. And I was reading through some of the stuff in the show notes for today. And I happened on one of the Tim Berners-Lee pages. Let me see if I can find that. Ah, yes. And Tim Berners-Lee on this particular page talks about a semantic web, or sorry, a magical artificial intelligence. He's talking about AI. Dorothea: [12:38] And he says this, the concept of machine understandable documents does not imply some magical artificial intelligence, which allows machines to comprehend human mumbling. That's literally what he says. Human mumbling. Excuse you, Tim Berners-Lee. Excuse you. Language is one of the most magnificent things we have as human beings. And you are calling it mumblings. things excuse you very much sorry that was my rant. Jonny: [13:11] No well felt i mean yeah his his relationship to this this sort of like you know the fuzziness of language is like one of the most fascinating parts of like the early outlooks on what link data could be because on the one hand there's sort of the romanticism of language and like the fluidity of language as being something to embrace but then almost immediately that becomes like squished out just sort of like the thing that's almost immediately excluded is the ability for people to actually express ambiguity uncertainty and so on yeah right. Jay: [13:45] I think last time you were on johnny or or maybe this was in like just oh no this is when we were watching it together but we talked about how like the the ted nelson versus the tim berners-lee view of like the interconnected internet and data. Jonny: [14:01] Right let's see if i can find interesting. Dorothea: [14:05] Dude ted nelson i actually did get to meet him once. Jonny: [14:08] Um i. Dorothea: [14:10] Was like wiped out at the time unfortunately but uh yeah i will i will always treasure that he was an, is i think still is interesting dude. Jonny: [14:22] Yeah the. Jay: [14:23] Chad ted nelson. Jonny: [14:27] Like so i'm like the story that i don't have a good like hold on it's just like so like what happened and this probably relates to just like you know some of the stuff that we talk about all the time in like cyber security screaming channel and just like saying what you may have to deal with as well of just like the state of technologies that go into libraries and how just like they're They're not actually under any of our control, and we sort of do the best we can to exist on whatever scraps that IT wants to feed us and stuff. And so I imagine that's the intertwined stories of why did linked data not happen all the way at libraries, sort of related to the institutional inertia as well. Dorothea: [15:07] Yeah, that's part of it. And, you know, getting back to my point about the question of getting off mark, relational databases weren't going to work. XML wasn't going to work and was in kind of a little bit of a decline as we were asking ourselves this question. So what was left? I remember a blog post by Jonathan Rochkang, who hates LinkedIn. Why does he hate RDF? And, you know, he backs it up. He's not just a random hater. But he was like, we can't, we cannot move to this. And I'm like, okay, what's the alternative, right? And there are things about RDF that are attractive ideologically, but also practically to libraries. The idea of the open in linked open data. We can really truly share and OCLC can't stop us. Oops, did I say that out loud? Wow. Dorothea: [16:08] I mean, you know, really, the elephant in the room is OCLC and its enclosure of Mark and Mark cataloging for its own corporate, and I am going to call them corporate, I don't care that they're not legally and non-profit, for their own corporate benefit. So linked data to some of us look like a possible way out of that. And, you know, I can't fault anybody for that. It's definitely a goal worth pursuing. So why didn't it get as far as we might have wanted it to? Part of it is that RDF was not built, and Johnny can speak to this more because he's read more of the STS and sociology literature around it than I have. But it was not really built for practicality or computability, right? I, as a complete Sparkle duffer, and Sparkle, if you haven't run into it, is the query language for linked data. It is to link to RDF what SQL is for relational database. I can make a typo in a Sparkle query and knock a server over dead. It's not even hard. Dorothea: [17:28] So, like, the brittleness of just being able to ask a question without killing a server, this is not a consideration for the early designers of the semantic web. And, like, how do you build a library infrastructure on a foundation that is that technologically brittle? And the answer is you can't. You really, really can't. Dorothea: [17:55] Another, I'm not going to say this is a problem, actually. I actually think it was good. But it's a situation that does not commend itself to libraries, to librarians, right? We tend to be very orderly people. people and catalogers as much as anybody and more than some so in the aughts right in well no not the aughts in the teens i guess particularly in europe there was just this flowering of experimentation with how are we going to represent the things in the library universe like books and maps and musical scores and all and movies and all that good stuff how are we going to represent present this in rdf lots of experimentation a lot of it was fantastic european is great yeah yeah there's a lot of really good thinking very practical thinking going into this but there were models data models rdf models ontologies if you will springing up all over the place and so if you're an average cataloger you're looking at this and going well what which one do I learn and which one are we going to use? And when is there a tool that's going to work with any of this? Jonny: [19:10] Yeah. Dorothea: [19:10] And the answer is there wasn't. Now, what seems to have fallen out of that, is that BibFrame, for all of its faults, and it has many, it is not my favorite bibliographic ontology. It seems to be kind of taking over the world and muscling out a lot of that European experimentation. And that frankly makes me sad because Europe, there's several countries in Europe that just plain kicked BibFrame's ass as far as modeling quality. And i hate that they're getting plowed under basically by this crappy american juggernaut but why why is this happening because they're finally tooling they're finally cataloging tools that as much as any rdf based tool can fail to suck, yeah like i know in in alma you can do. Jay: [20:11] Bib frame stuff in alma. Dorothea: [20:13] Yeah but you can you can look at sinopia and you can look at marva and you can imagine an actual person using these right and making them work and getting good records out of them which we didn't have for at least a literal actual decade after BibFrame happened. So when Tim Berners-Lee calls human language mumbling, I think it's a symptom of the contempt that so many linked data people have for human beings. And I yelled at the Semantic Web and Libraries conference in like 2014, a decade ago, about exactly that. Stop dissing human beings you can't do that if you actually want linked data but nobody listened and here we are right yeah. Jay: [21:06] Like another idea and this was also something i think i talked with johnny about like another idea for a goofy like high-minded like theory paper i had was thinking of like linked data as this attempt to like do a reverse confusion of tongues like a pre-tower of babel divine language that ignores the actual... The reason that linked data is cool is that it has the potential to, everyone to have their own way of doing it and it'll talk together and intermingle instead it's just turned into this like nope everything looks this way now and this almost like mechanized version of language like taking over like it doesn't care about being human readable actually. Jonny: [21:49] Right and like so it's it's like this this tension that that was there from the origin of it and it's It's actually just like the dawn of the term linked to data as opposed to the semantic web is just like a part of this, the same thing of like part of this. I feel like we need to like at least nod to, because it's like, we talked about this at length last time I was on here, but just like also nod to the Lindsay Poirier piece, like a turn to the scruffy, which is like we both called out as being like one of this, this is like seminal work on like understanding the culture of the semantic web. And just like that just like points to and also just like it's there too in in tim biel's website, of that just like the separation of linked data and linked open data from the semantic web was about like reclaiming just like stuff that worked as opposed to stuff that like was perfect that just like this is we're about like trying to make a bunch of separate ontology so it's like the initial idea of being there's one graph like one global graph where everything is always linked together and there should be one uri that represents each unique concept and only one. Jonny: [22:55] And to the point where just like there's these sort of like absurd blog posts and like one of the things that's amazing always about just like web history is that a lot of it is just like still there and still up there at least on archive.org but just like these just like blog posts that i think this is 2009 i put this in the in the links as well but i'm just like they apparently took took down the comment section on it. But just like someone that was like from like semantic web, like in this era of just posting a blog post about when the first time that the New York Times had like linked data in their web version of the product. And so what they'd done is they'd made some, you know, article that was about. Jonny: [23:35] Barack Obama and the quote unquote, you know, the racist controversy, like, you know, Barack Obama is a Muslim, whatever. So it was an article about that controversy. And so there was an RDF claim that was like, Barack Obama related to Muslim or something like that, that just like, this is just like trying to describe the contents of this piece of writing. But then people immediately were like this is messed up because that's now a claim on reality and it's like it's not just like someone says this it's just this is a fact and that was just like something that like the rdf group had specifically designed to be doing and so like the the model of the world that like people keep trying to escape from but now need to return to but keep trying to escape to have to return to is that like when you make a statement in rdf like Like there's a difference between the way that like the language and the syntax and the systems designer thought about it as being literally like, like there are some like really remarkable quotes in the W3C archives. And I was like, I was trying to pull up earlier, but it's like that, like this, this one quote from Brian McBride, 2001. So this would have been just like only a couple of years after the project formally launched at W3C. That's like RDF is not just a data model. The RDF specs should define a semantic so that an RDF statement on the web is interpreted as an assertion of that statement so that its author would be responsible in law as if it had been published in a newspaper. Jonny: [24:58] So these are like, they're supposed to be like legally binding documents in this way, where there is no such thing as an author. Someone says this, you know, that just like when in reality... Jonny: [25:12] Everything it has an author everything has a point of view and a perspective and just like was said by or written by somebody but like you know it took a while for even that that notion to be encoded in the language at all as like an expressible thing period adding the fourth item in the triplets like being able to say that this doesn't belong to the global graph of everything but in fact is my like local system of meaning and then but then like that just like this that you know you have to keep escaping that because it doesn't actually work because it's like the thing that i always come back to is i just like imagine if language worked this way where i have. Jonny: [25:54] To i want to use a word and i i have to use johnny's version of this word and so i have to to say like i had to go into like johnny.net slash this word and now i'm referring to that one and there's no way that i can make my own copy of this word it's like in the way that language works of just like you know we have these sort of like parallel representations of ideas and concepts and words and phrases that are like you know they're not the same at all even close to the same in between person to person when or even utterance to utterance and yet like we're trying to express like a system of meaning where there is one version of each of these things like no simply no one would do it like no one would if i had to go to the dictionary every time and look up each person's unique word and like use that or else it was meaningless then it just doesn't work so like and it's like intimately i don't know i don't want to just like trail off forever on here but it's like intimately related to the tooling problem where like theoretically and so like one of the authors of SCoS, like the Simple Knowledge Organization System, like the ontology and modeling system for like modeling relatedness and similarity. Dorothea: [27:07] It's how you do controlled vocabularies in RDF and it's actually quite functional, quite useful. And if I'm not wrong, I think homosaurus is actually based on it. That's your underlying, how you're modeling this stuff. Jay: [27:20] Yeah, it's SCoS, yeah. Jonny: [27:21] Yeah, it works pretty well. And like, so you'd imagine that like a tool like that, where you're able to say that something is a similar match, or this is exactly the same as this other thing would enable this kind of like expressive system. And it doesn't because doing all of those queries and lookups is preposterously expensive, because of just like the way that it's encoded as URIs, i.e. URLs, i.e. I need to hit a web server every time to actually retrieve this item, as opposed to Yeah, there's a any number of different web architectural models that that That could take, but that's the form it took. And so as a result, like, yeah, it's like intimately related to the tooling as well as the implementation of the technology, like in the same way that it is a reflection of the ideas behind it. Dorothea: [28:08] Right on. Jonny: [28:08] Yeah. Dorothea: [28:09] So how are we doing, Sadie? Clear as mud? Sadie: [28:12] Yeah, just about. Like, I think the thing that gets me about linked data and, like, I haven't gone to library school. I have just, like, the most barest knowledge of cataloging and that kind of thing is, like, I'm a very practical, hands-on person. So, like, I have to dig into a system to be able to show, like, to really see how it works. Dorothea: [28:36] Oh, yeah, totally. Sadie: [28:38] Every time I have tried to do that, to even think about open linked data, I'm like, I don't, I don't see how this is usable. So that yeah, like you talked about, like, there is, there needs to be tools to be able to use it. It sounds like the heart of the problem at a lot of library technology where I keep saying this is just like there's a very small selection of vendors that have a very large control and they just keep conglomerating together. So there's like three now. And somehow libraries, who are the ones who are using the tools, are the most powerless people in the whole ecosystem of it, right? So a big topic at my work lately, and maybe a tangent here, is why the fuck are we still using SIP2? Dorothea: [29:32] Can't blame you on that one. Sadie: [29:36] I don't know if you're familiar with SIP2, Johnny. It's basically a protocol. So integrated library systems, ILS is the biggest software that libraries use to keep track of all of their stuff. It's basically the protocol that passes information between. Sadie: [30:01] These systems, right? So like a lot of vendors use SIP. So like, like Overdrive, you know, you like Overdrive has to know what you already have checked out to be able to enforce your limits. Like you can only have five books checked out. So it uses SIP to query that information from your library system, right? It is entirely unencrypted, clear text, unencrypted, and has been its entire life. And SIP2, which is different from the IT SIP, which is a VoIP protocol, which causes no end of confusion every time people are, like every time we have to talk to a vendor IT to figure out how to set something up. Sadie: [30:43] I just totally gave myself, if a single one of my coworkers is listening to this, I just absolutely gave myself away because I've had this conversation so many times. But yeah, it's like, and it's been in use for so long and all of these interlibrary, it's the only one that is actually usable, like actually, what's the word I'm looking? Agnostic, system agnostic. So it's starting to be replaced by a lot of APIs, but each API for each system is its own thing. So you have to wait for other like, you know, oh we could do this api we could do i don't know if this is true we could do almost api but we can't do sierra millennium's api so it's just like just like and in it it's just like why the fuck are we still using this and then we talk to people like vendors and they're just like well what's the problem and we're like it's completely clear text and requires extra tunneling to be able to actually keep our patron data over like not readable over the internet and i've asked. Dorothea: [31:46] It all over the entire internet. Sadie: [31:47] For anybody and like looking at the strings it's literally like library card number name full name address you know number of checkouts like it's just like it's it's so ridiculous and people are still just like well i don't i don't understand what the problem is until you talk to an ipt person and you say it's in clear text it's completely unencrypted and they go oh that's bad but no libraries have like the power to go to these freaking vendors and just be like you have to figure something else out something has to be worked out but it's going to end up being you know oclc who does that kind of stuff or something like that and then yeah it'd. Dorothea: [32:30] Be nice out right and. Sadie: [32:32] Yeah they're. Dorothea: [32:33] They're vendor patsies that's That's all we are. Sadie: [32:36] In a lot of ways, yeah. Justin: [32:38] Yeah, what was it Bree said in the Skullcom Discord? ACAB includes Niso. Jonny: [32:43] Yeah. Justin: [32:44] Yeah. Dorothea: [32:47] Absolutely. Sadie: [32:49] So like, I still don't think I understand entirely what linked data is, but I, I do think that I like, I can start to get to it if, if you know what I mean, because yeah, like it's, it's just, it's a system. It's a system to connect data to other data in meaningful ways and it once had the promise to actually help libraries figure shit out and it has completely kind of shit the bed on that is that is is that an accurate that's. Dorothea: [33:23] That's completely accurate i still have tiny little sparks of hope Oh, I do. Jay: [33:30] Did we describe why it's called the semantic web? Dorothea: [33:35] Oh, I don't think we did. Johnny, I'll leave you that one. Jonny: [33:38] It's a really simple story. It's like being like web happened, right? And so web is documents with links between them. But those links are meaningless. They're just the relationship from one page to another. And it's hard to imagine this in retrospect, respect of a web without search engines or without any sort of like overlay to them because like basically the way that everyone interacts with the web now is either through search or through some mediating discovery mechanism like you don't just like go on the web and then go to a url and then just be like well i'm here now and just like i've found the internet and like it said yeah so like that's like the way that the web was sort of designed and like the way that it's supposed to work is it just like it would be self-organizing where the like the literally like if you go back to like the founding I was like we will just have people that have lists of links on their personal websites and they will link everything together and then just like people will find their way from these like local nodes of meaning like and the imagination there was always that just like the web would be super easy for the average person to make a website on and that just like everyone one would basically have one. Jonny: [34:50] And that didn't work at all, not even close, not even from the very beginning, where just like, you know, this, it was the case where just like the ultra nerds that were on the internet at the very first part of it, still, you know, gravitated towards sort of like mediating platforms like bulletin board systems, and etc. So the semantic web was supposed to be a way of encoding computer readable information into the protocols of the web, and specifically into HTML documents that are, you know, that are XML, a dialect of XML. I don't even know how to describe the relationship between HTML and XML. But like, so that it would be possible to both annotate a given page and then also just like be able to link them together so that you'd have this sort of like, you know, coexistent between documents that people are on that have like, you know, human readable text, and then embedded within that and embedded between that are just sort of like, in this paragraph, I'm talking about this person. And like, then I can sort of like, say, go to that page and theoretically go and find backlinks to all the time that that person was mentioned or something like that. And so that's like why it's called like the semantic web is we're adding semantics to the web which formerly was just sort of like naked links and documents yep. Jay: [36:04] Like the computer could understand what that johnny is a person because it knows what those uris are and what they point to and it then can then tell what the relationship between those are not in a way where it knows what a person is but it knows what this uri is and if you use this uri then it sees other things that have that uri and knows that they're people too And. Jonny: [36:28] There's a certain amount of magical thinking that like, because language sort of works this way, that it's like entirely relational and metaphor based and like, you know, the meaning of a word is only sensible in context of surrounding meanings and contrast with similar, you know, that just like meaning would emerge. And like again like that's sort of true like there's like like language does work like that just like sort of local negotiations over meaning and indigent but like you need to have the people there negotiating in order for it to work and that never really existed so just like so like there's and it sort of like points to one of the salient features that is both like it's like you know, eerily prescient, but also just like another one of these like critical pieces where we're talking about just like the missing tools, is like from the very beginning, like there's this 1999 piece in Scientific American that Tim Berners-Lee, that was like sort of like the public announcement of like, you know, the existence of the semantic web as a problem. Dorothea: [37:28] I remember reading that. I was at work. I remember reading it. Jonny: [37:33] And so is this wonderful document and just like that like is like this very pie in the sky kind of system of, you know, release about just like what it could be. And like, there's a bunch of just like really basic and obvious things that like, wow, we should really have the computers work like that. We're just like, you know, like the, the, the idea that I have a calendar appointment or whatever. Jonny: [37:57] Why can't my computer know that, like, I also have a photo that was taken on that day. So I can just like say, computer, find me the photos that were taken during this appointment on my calendar or something like that. So like a sort of universal acid for this data, where just like, I can just relate, you know, totally heterogeneous systems between one another. Jonny: [38:19] But the part that's like really, like, you know, come to be, we all like thinking about just like Like AI is like, you know, this year and this last year being like, it was always going to be dependent on compute. That it's just like, there's metadata there. But even from the very beginning, you need what Tim Bersley was talking about as agents, like as about just like little bots, little scripts or whatever that are running around getting all of this metadata around. And this is like around the time when Google and like the first algorithmic search engines were starting to exist. Exist so like this idea of crawlers and ingesting this information and making sense of it was like a relatively new one especially like at a mass scale like this and like that's but that's always been the tension we're just like like say just like talking about like what is it where do i touch it like how am i supposed to use that just like that was sort of always the intention with that just like you would have like a little computer butler thing that would just like be going out and you have your own set of commands to just sort of like, go get this for me, go fetch this for me. But again, it's never really materialized just because with what infrastructure does the average person have a constantly running bot that goes out and scrapes the web for them all the time? Jonny: [39:37] And so even from, yeah, there are a couple of moments in the history of the Mending Web of times when Google basically bought it. That happens actually several times. We're just like this sort of domestication of this process where like, now like when you think about it, like, where does it exist? How does it exist? Pretty much the only way that people usually interact with it is like the metadata, the open graph metadata and well, that open graph slightly different, but like the JSON LD document that you'll have at the top of your website header that is just like, using schema.org terms to say that this is a website about an organization or an event or whatever. And like as Justin was saying in the beginning, just like sometimes it makes the Google info boxes work. And like that's pretty much the most concrete realization that the average person has for linked data on the everyday. And that's because who owns the crawler? Google owns the crawler. And so it becomes something where you make metadata available to be crawled by Google in this very constrained, commercially focused context. But it's not a system of expression. Jonny: [40:45] And like, just one more thing is like, there's like these other technology that like, RDF-A, like this dialect of RDF, which is supposed to be like the thing that goes embedded in documents where like, as I'm writing, I will tag a particular paragraph as you know, with some, you know, semantic web tag or something like that. That's like arguably one of the most like attempts at making human a human link data like interface for that we're just like you could imagine i have like a document editing software or something like that and i can highlight a highlight a sentence and add a tag to it or whatever you know just like actually embedding this in documents that people actually use that is actually no longer supported by the main art like rdf parsing library rdf lib in python because it's complicated to parse, but also it's just sort of like, that's not really the important one. It's like, you know, for all these like mushy positional document tags and stuff like that, and people don't really want to know the information in context. They want it all split out into like, you know, something where I can do an HTTP request and just get the headers and that's it. Jonny: [41:54] And so like, it's like, it's just one of these mutating landscape of technology always ratchets more and more towards, it's intended for doing the big web of open data that you're not a part of but you get to experience through platforms and a lot of platforms are in fact powered by linked data at least if not rdf knowledge graph tm derivatives of that idea where like it is an extremely powerful set of of ideas but not for you so if you but if you are a company that exists as a giant conglomeration of data sets that you've bought by acquiring smaller companies over time it is an incredibly powerful system for integrating all of that information being able to do complex queries across them so in that piece for tim berners-lee. Jay: [42:45] Not for thee. Jonny: [42:46] Exactly and increasingly for the surveillance state and just like the people who have this nightmarish multi-sided market of selling your data to insurance providers at the same time as selling it to police at the same time as selling you back a little slice of it as well so like it's yeah the way that it exists now is largely in the shadows and that's by no means passive effort there's an active corralling and an active domestication of this set of ideas. Dorothea: [43:19] And to bring it back to tooling for just a second, Some of the more pro-social, I guess I will use that word, experiments in this space, like Wikidata, for example, are already running up against the absolute limits of what you can do with linked data if you're not, like, Google. Google, they've already, and the technical details here completely escaped me, but Wikidota has gotten too big for its britches. The infrastructure literally cannot cope with it anymore, so they're sharding it, is my understanding. They're kind of splitting it down the middle and figuring out how to get the two shards to talk to one another, which I'm sure is really exciting technically, but wow, that's not great. For those of us who are not Google, but are interested in this technology stack. Jonny: [44:10] Did you see the the cause of this this issue is that like it's the underlying database software blaze graph that it's running on amazon hired away all of the engineers so they're oh great yeah so all right typical so again this is like the big company is literally buying the underlying technologies we're just like you know the software needs maintenance you know that like that it needs maintenance and these constant improvements and just like to be able to handle an ever-growing stack of triples like Wikidata, you need to have active maintenance workers. And like, Who pays for open source work? Like, if I'm a software developer and Amazon says, here's, you know, 250K a year to make the, do the thing you were already doing for free, then it's like, sure, I have a family. You know, I, you know, I'd like to have, like, you know, go on vacation sometimes. And so, like, yeah, it's just like, yeah, actively, that, that was another moment of, like, yeah, actively poaching away the talent so that, like, the underlying technology can. Dorothea: [45:15] And I will say, for all that we are cultural heritage organizations founded on the idea that culture should persist, we're very bad in libraries and archives at admitting that software needs maintenance, that standards need maintenance, right? That's the SIP2 problem in a nutshell, though that was proprietary, actually. So Ruth Kitchen-Tillman and I wrote an article, got published about a year ago, about the ethics of linked data sustainability. You can find it open access online. And we took a pot shot, actually. Okay, we. I took a pot shot. This one was mine. At information scientists. Okay? Because there are too many information scientists who are serial project and standard abandoners. They get grant money to do this fancy-dancy thing, and they get as far as it being implemented in libraries. And then they just wander away to write the next grant application and do the next fancy-dancy thing. And then it rots. Jonny: [46:26] Totally. Dorothea: [46:27] Right, whatever they built, it rocks, because inevitably, they didn't build it right in the first place, and I'm totally thinking about OEIPMH here, since we have some Skulltom folks in the room, but SIP2 is another beautiful example. Gosh, we are so bad at versioning stuff. It's a really basic idea. You gotta version stuff. You can never get it right the first time. So yeah, I, in that article, took a pot shot at serial project abandoners and said, funders, stop funding them. Ask what happened to their last three projects. And if they're dead in the water, add some black mark. Jonny: [47:06] For real. Yeah, this is a general issue in any sort of publicly funded tooling space. Is that just like... I was allegedly on some review panel for some funding agency that is theoretically talking about software sustainability. Jonny: [47:25] And that was a completely novel concept that just like what we want to do is we want to fund sustainable software ecosystems. That just like we're not trying to start a new project. We're not trying to like, you know, fund the new feature, but just like, these are the already existing things that are happening in open source. And let's just keep that going, like paying for like, like stuff like documentation and like making the tests work and like, you know, years and years of technical debt. And like security audits yeah totally yeah and please yeah and so this is like this is one thing like this one of one of my entry points into thinking about semantic web and thinking about just like linked open data was just like initially thinking about because i was like living with someone who is like working in metadata in a library at the time and there was this like increasing cry of just like the we all know the journal system is broken and like there's this recurring strain of papers that are just sort of like let's just like make the libraries do it you You know, just like that, just like we can sort of like get libraries to host a bunch of journal like things, journal like overlays or whatever, completely ignoring the reality of work and the reality of bureaucracy in libraries that just like. And and and so like, you know, you wonder who I'm talking about. Dorothea: [48:47] Oh, I don't have to wonder. I let him talk it out. Jonny: [48:49] Yeah and and so like that just like this is where like and on the one hand it seems like an obvious thing where just like of course like it seems like libraries in general in the abstract should be invested in just like you know maintaining some their catalogs at least but just like also the all the other things that just like you know that are being archived and cataloged and just like you know exist in libraries and just like making that as available as a public catalog on like sure surely they're already doing stuff like that so it shouldn't be that much of additional effort to have an institutional repository that acts like a journal and like can link together these things but as y'all know yeah i keep. Dorothea: [49:33] Coming back to tooling. Jonny: [49:34] Yeah tooling. Dorothea: [49:35] Was shit the tooling for open access is and always has been shit. Jonny: [49:42] Right. Yeah. And so it's just a matter of like, that like, there's there is this universe of like, we're like, okay, we could get sort of some of these things aligned, like funding priorities for maintaining sustainable software. Okay. If we can then like, get some sort of like IT consortium to help out with like, maybe, you know, quote, unquote, public cloud. So it's not the case that just like every library needs to have like an on prem IT team. That just like there are some of these things that could like lock into place that just could theoretically make some of this work but just like that's just not the way academic work is done generally and just like that's just not the way it's structured to make these sort of like long lasting infrastructural efforts like as you say that these are just like grant cycle to grant cycle let's just like ride to the next thing and even within so like part of my role in the last six months of work it's like i'm working with actually a lovely group of people who i who i i. Jonny: [50:41] Like and they have welcomed me and so i'm not trying to speak ill of them at all but just like this is a linked open data project and basically what i've been trying to do for the last like six months is like pay down technical debt we're just like there's this like really good idea of this like this way of having authorable linked data schemas doesn't require you to be part of the priesthood to be able to describe what exists in your reality but it's just like i didn't really work it's just sort of like they're just like that it's just like the people that are concerned with the modeling part about the the like what you know what is this kind of thing do we put it in this category like this like are not usually the same people who are just like going to be able to write a really good implementation of that and so like trying to figure out how to make those collaborations happen as well because this is another point where like i i don't see this as a thing that really could exist or come from any sort of startup like rest in peace to the solid project which i have been trying to find for several years and i keep seeing little promising scraps of it but this is like so solid was like the thing that tim berners-lee was like this will be the semantic web like the thing that we're trying to like do to so it's like it has like crisis of conscious, like actually the web sort of sucks. Jonny: [52:02] Like, like I think around like 2015 and 2016 and like, you know, starting to be just like, okay, let's try and make solid as like a way for people to do the, like the, the more like vernacularist dream of the semantic web where I have my. Jonny: [52:17] Like this, now they're talking about like activity pods. Like I have my little unit of my semantic web, like graph and information graph. But that quickly got bogged down in the academic cycle. No one could manage a project. Then they spun that off into a startup. And wouldn't you know it, once that happened, then it became owning your own data was a bug, not a feature. And so now you're supposed to be pushed on to renting a cloud server for it and so on and so forth. So I think that this doesn't come from startups or from any sort of company. It also doesn't come from the scattered wastes of open source world. They're just like, you can't just like ask people to do it for free. And it also doesn't come from this like local efforts of like trying to make tools for like an individual institution. And so just like what's left is like, you know, we need to use some sort of public funding and try and rally public funding in a way that it's not designed to be allocated in order to like make these kinds of technologies. And also the belief that there should be these technologies in the first place in order to make that real. Jonny: [53:22] And so like, that's, this is like this unending knot of like, who do we, who is the next little thread that we need to pull in order to make this large tapestry? But then like you you're dealing with 25 years of baggage at the same time so it's like a lot of the people that are still in that space either have distanced themselves from it and i have and look back on it with this chain of mixed emotion mixed emotional memories but i don't want to touch that anymore or they're like in some way still true believers that just like what do you you mean nothing is actually broken it's totally fine and like you just need to learn how to do it good and so yeah so like like and so this is like one of the reasons why i'm just like like we were talking about this earlier today just being like that in some ways like talking about like serial project abandoners protocol abandoners that just like there needs to be like a break in a way that's like backwards compatible we bring the past with us or like or have some way to like carry it through with us but we're not beholden by all of this baggage that and and so i don't know like talking about just like what happens in the future i guess i don't know if we've even gotten past the. Jonny: [54:38] Expository part of what even are we talking about yet but like maybe i'm jumping the gun there but like yeah just last last thoughts on that idea is like that's another part like the twin, entry points for me into this whole line of thinking or just like thinking about just like what could be an alternative to scholarly communication and publishing. It just like, it shouldn't be possible for me to throw stuff up on the web and then have it be part of this sort of like blob of information without like a lot of gatekeepers in the way. The other part of it is that it's like, even long before I got interested in it, I keep coming across these various like graveyards of things that are just like. Jonny: [55:17] This is a really cool idea, like a browser extension that like everywhere I go, I can make sort of personal annotations and not just like bookmarks, but just like I highlight this section, and then I can relate it and share it to my friends like, oh, actually, that extension was for like Netscape 6.0. And like, was abandoned 20 years ago. And like, no one has thought about this ever since. And just like this long string of just like dead projects that are that are exactly like this because again like didn't you imagine like the kinds of open source projects that work and like are sustainable are usually ones that have some material tangible benefit for the people that use them day to day like this is a tool i have active use for or their baseline behind the scenes infrastructural work that like a lot of companies that will just like sort of rely on them like the but the types of like this niche of technology just like what what you have to have in order to use it are a website. So that rules out 99% of all people. And then be like a website where you are deeply in control of the HTML that goes on that page. And that rules out 80% of the remaining 1%. Jonny: [56:29] And so like, that just like, there just, yeah, there never was a time when it had like an actual practical use. And this is something that just like gets called out as early as the The earliest I've seen of people saying, what is the point of all this was like in 2005 and 2006, where just like there's a series of these blog posts of just like abandoning the semantic web. It's like, no one actually figured out why we're doing this at all. Like, there's one interesting example of like music annotation, where just like it's sort of like a peer to peer ish music system. And then that's it. Like the rest of it is totally pointless. Like why would i ever do this in the first like invest all this time into learning these incredibly complicated parts of it because like one of the things that we're missing in the exposition stack is the exposition section is like the sort of stack of things that that the data is like you have the triples part which we talked about but then you also have like ontologies and schemas and just like the way that these things all sort of relate to get in it took me a year to even and figure out what these meant and what they look like and why they existed and just like why is a schema different than an ontology? That seems like the same sort of thing but there's like different roles in the ecosystem and also definitely different... Just to say that... Dorothea: [57:55] Why does neither of them have record constraint language? Jay: [57:59] Ontology means that your professor goes on tangents about first-order logic when you're learning it. Dorothea: [58:04] That's right. Jay: [58:05] Yeah. Justin: [58:07] And schemas are on schema.org. Jonny: [58:09] Exactly. Justin: [58:10] That's how you know there's schemas. Jay: [58:11] Also, was the music project you were talking about linked jazz? Jonny: [58:14] I will look up this. It's in this blog post, Abandoning the Semantic Web. I'll see if I can find it. Jay: [58:21] It's linked jazz rules. rules yeah. Dorothea: [58:24] That's a great little site i love it that. Jay: [58:26] Was like the first i ever heard of link data i was like an undergrad still working in a music library sure and my and my like mentor professor or not professor my mentor like boss was like this is the coolest thing i've ever seen in my life. Dorothea: [58:42] Well and and and music in particular in a library context is actually a really wonderfully subversive place for for late data to get a foothold because mark for music saw. Jay: [58:55] So bad oh it's terrible music cataloging like music copyright is something that even seasoned professionals will not touch um yeah music cataloging is its own has its own rules i mean heaven. Dorothea: [59:12] But but wow mark was just not designed for that and it shows. Jay: [59:16] Oh it shows it shows yeah back to. Justin: [59:23] The explaining part of things as well one of the one of the main benefits always sold about link data is that since the web is sort of a page or document focused sharing of information this would allow subsets of information to be pulled like johnny said pulling like all the headers from an article with a request the thing is that like without out like I could pull 9,000. Justin: [59:51] I don't know 500 fields from a mark record what do i need that for because i don't know anything about the context of it uh without the full document plus that's i'm guessing that's probably why it's so computationally heavy is that everything has to be done through servers whereas documents can be retained locally and it's just mostly just text files right so it's sort of the same problem blockchain had where everything had to be done computationally And that's why it took 20 minutes to buy a donut because it had to get pushed out to like 20 ledgers. And instead, this is like, if I want to query information, it has to go through different servers, which I think was kind of the idea of websites that heal. I have it pulled up. It's a John Rhodes blog post. But when Johnny was talking about bots, I think that was the idea was websites like link rot would happen between them. And eventually bots would just kind of communicate server to server constantly and then just fix links and they would heal themselves and that was kind of the idea and that blog post ended with if anyone wants to write this i'll help but until then but that's the thing is like it's very difficult to do that because if you've ever worked with like government websites. Justin: [1:01:04] Particularly like healthcare websites every presidential administration stuff moves entire divisions of the government and so they're on completely different domains and that's why government websites always break and like really important ones and that's also why the the, government tends to do a lot of like dot coms now where it's just like healthcare healthcare.com okay just go there and we'll point it wherever it ends up because trying to keep because i was an allied health librarian and trying to keep those pages about like the affordable care act up to date in libguides i mean thank god has a very good link checker but i constantly had to run that link checker because those things broke all the time they. Jay: [1:01:47] Don't even keep their pearls or whatever it is that they use because like one of them one year in grad school i was the the gov docs librarian graduate assistant and half of my job was just like going through sudoc stuff and then And also, like, checking the pearls or whatever permalink system that government websites and online GovDocs uses. And just finding all of the broken ones, which was all of them. They don't even maintain their permalinks. Yeah. Which is the point of permalinks, is so that the back... The URL itself can change. Dorothea: [1:02:24] Well, if I... Can I write on OCLC again? Jay: [1:02:28] Yeah, always. Dorothea: [1:02:30] Yeah, that was actually another example that Ruth and I wrote about in our piece, was OCLC and Perl.org, which was not originally OCLCs, it was a grassroots little thing for okay, here's a place where you can mint permalinks, and we'll keep the database of where they went to, and everything will just work, and we'll happy permalink utopia and then with absolutely no warning some years after oclc took over pearl.org and made a very loud statement about how it was very important and they were going to maintain it and definitely uh it broke they broke it the the person i i don't know the details i think the the person who had been maintaining it left retired who even knows but pearl org just completely broke oclc of course didn't give a fuck and it remained broken for like several years and now the internet archive eventually took it over and they don't give a fuck so you can't actually get any support for it. And a bunch of innocent third parties who believed OCLC's lies and gleefully minted all kinds of pearls because they thought that infrastructure was going to stick around, dot burn. Dorothea: [1:03:56] Right? This idea that Justin, I believe, was talking about of self-healing websites. Dorothea: [1:04:02] Right, that is nonsense. That is garbage. The world does not work that way. The world needs maintenance. Jonny: [1:04:09] Yeah and and so like there's like this whole nest of ideas about like roads not taken in the internet with a lot of this because it's like i have the same feeling about just like permanent ideas and like and as i do about just like in general when i see like a yet another platform for scholarly communication or like we're going to fix the ills of like academia by making yet another platform is that just like this is intrinsically a political one where and it puts And it's one where you are putting power in the hands of a specific organization that just like, and the longevity of that is strictly social. We're just like, it's the same way just like permalinks exist as long as the organization exists. And so like I have in general sort of like more faith than average that archive.org will continue to exist in the next year, although they are sort of like damaging that reputation lately to sort of like, like, just like, you know, anyway, we won't go there just being sort of like. Jonny: [1:05:11] I think that they have good longevity plans for their archive of the web okay but and i also in general think that like the doi system is probably not going anywhere but that's largely because it's like you know one of the mechanisms for extracting billions of dollars from public funding every year then just like so there's like social reasons why these things persist but it's like there's the major thing that was not taken like why the like as you're saying just like the web doesn't work in such a way where it would be possible to do self-healing websites or self-healing links is because it's designed to be a client to server, you go to a place and get something that someone else controls entirely. And like, you're not actually supposed to have any agency in this world. And like, there's good reasons for that. Don't get me wrong. But just like, this is like one of the true things about linked open data is that just like, it needs to be peer to peer, The way that it could conceivably work is as a peer-to-peer system where it's possible to do efficient querying and caching between a bunch of different peers. So it's designed to be distributing labor in this way instead of every time someone updates a link or makes a new record, everyone has to go and hit this one server to get this one URI that represents this core concept or whatever. Jonny: [1:06:35] And so as long as that doesn't exist, there's this duality of this beautiful idea of, of basing semantic web and linked data on URIs? Is that just like, okay, and elegant simplicity of this idea that the identifier is actually a location, that like location and identity are the same thing. And when I go to that location, I'm supposed to get something useful from it. And then that allows me to go to the next thing. That's like a wonderful, wonderful idea. But in reality, it doesn't work at all because like identity and location are not the same thing. That like i didn't and because you know for one one reason is identities change and like that like that like and so like there's this like you know classic thing that everyone always reference on the web is that it's like cool uris don't change that's another tim berners-lee classic it's like actually all uris change all the time and like and for that to be something where just like you You have a polemic trying to force something to behave in a way that it doesn't rather than adapting to the reality of that thing than just like, yes, you buy yourself in an infinite failure. And so like one of the there's this. Raising your hand. Jay: [1:07:56] I just want to jump in. Yeah, we do the raise hand thing to like you can keep going. And then when you're done, Sadie will say something. Jonny: [1:08:03] But also just like interrupt. I actually would start trying to make some notes to organize this thought, cause this is a long idea. So like, I, but like, yeah. Sadie: [1:08:12] Oh, I've been thinking a lot about the purpose of a system is what it does. Jonny: [1:08:20] Completely. Sadie: [1:08:21] Right. Not what it thinks, not what it was designed to do, because we all know how design goes awry. But yeah, the purpose of a system is what it does. Dorothea: [1:08:34] Right on. Sadie: [1:08:35] I don't remember where I saw that. I love systems theory. Jay: [1:08:38] Yeah, right. Sadie: [1:08:39] Right. Jonny: [1:08:39] So if you, if anyone has ever maintained a website or any sort of web technology, we're just like, if the intention of this thing is to be liberating and freeing, it certainly doesn't feel that way. That just like that, like, you know what it would take to actually maintain a URL for forever. Like if that's the way the web is supposed to be, that the purpose of the web is to like put these documents on the web. Like it didn't, it doesn't do that. So yeah, exactly. That just like the purpose of the system is different. We're just like, and like, again like thinking about just like all the ways that the technical development has been stunted by the you know commercialization of the web that just like precluded a lot of these things from existing is like it's not an accident it's so like so one of like one of the ways the ways that linked data is working en masse right now in a pretty invisible way is the fediverse and this is like what we were talking about the last time i was on here so i won't belabor the point but it's just like that that's built on linked data at least in the abstract and this is sort of fascinating like realization of that we're just like like for example like macedon like the largest implementation of that does not actually use linked data as its internal data model that's all like a postgres database that then it's sort of just like synthesizes json ld out of and like as like there's benefits and trade-offs that we're just like as a result it sort of doesn't do all of the linked data parts of what ActivityPub was supposed to do. Jonny: [1:10:09] But there's the other, like, one other major alternative to this is Pleroma and Dekoma, like the fork of Pleroma that is based on a graph database. And that can do a bunch of really interesting things. But it also is, like, always crashing all the time and, like, sort of hard to, too, because it's like, you know, think about just like, because social networks are networks, it's like easily modeled by a graph. And, and so doing something as simple as just like, there's this notion of like this containers and these ordered collections and stuff like that in activity pub. And one ends like this, I have, you know, obviously lots of feelings about this, this particular spec, but like, one of them is I have. Jonny: [1:10:53] A this notion of who i'm addressing my message to and i should be able to address it to whoever i want to that i have i can address it to this one controlled ontology term public and that's just like i'm sending it to the world but also it should be possible for me to have collections of people and like i can address it to this collection of people and so it's like in that way i have a graph and then that graph is. Jonny: [1:11:18] Modeled like and all the relationships are modeled within in activity publishers being like i'm allowed to send it to these people and i want to send it to this subset of them in this particular case and so you can do stuff like that in a coma employment like i like the ui for it is a little less than what could be desired but that's not something you can do in macedon where each one of those addressing features has to be carefully architected from like as a as a database query so like there's a this this tension of just like okay we try and do it the semantic web way has the beautiful possibilities but it's like really hard to implement and one of the things that's hardest that was extremely like big reach and was really only like done and made work by just the sheer hegemony of mastodon as like you know the the thing that if it does something everyone else has to adapt around it is like implementing editing like you know thinking about just like i have a post i want to edit that post that means i have have to propagate that new version out to everybody else and so like thinking about just like what it would take to have like these sort of self-healing websites or just like the ability for the web to adapt to change is like you need to have that expectation that just like everything that i know about i should be able to receive changes and be able to propagate those among the people in the same way that just like that's how rumors and horizontal information transfer works generally is that just like. Jonny: [1:12:39] Oh, I heard that this new thing happened, and I tell my friends about it, and just like, you know, maybe and doing so in a way that's like actually safe, and, that is resistant to counterfeiting is a remarkably hard thing to retrofit into a system and so like that's like like. Jay: [1:12:58] How do we make the web actually rhizomatic. Jonny: [1:13:00] Yeah and yeah and this is like again it goes back to the like the dawn of the web browser and what it is as a technology is like this idea of the read write web we're just like it should be just as easy to write as it is to read on the web and like you know obviously controlled by permissions in some way but like this that experiment died basically when netscape won in the early browser wars but then it persisted in the form of wikis and this notion of soft security where just like how do we make that work is we make it so that doing this kind of like you know we allow stuff stuff to happen but then make it so it can't damage the system in some profound way we're just like if someone does something they're not supposed to do you know someone goes and vandalizes a wikipedia page or whatever then like sure the next person that goes and loads that page might see a bunch of vandalism and that's bad but like it's not it doesn't ruin the page it doesn't break it forever and completely like it's possible for me to revert the old version of it and and so on and so forth. So like, and that's a radically different political vision than the, most of the web stack that we're familiar with. So just like that, it's like that. Jonny: [1:14:20] Ultimately, for this technology to work, it needs to be constructed on a different set of political primitives that include other people existing and being able to do stuff in a way that just like is very uncomfortable for like most of the people who design web technology nowadays to think of that as being I'm going to design a platform that I administer for other people. And so instead like thinking about it as being stuff that is designed so you get out of the way like the most successful technology that would enable like semantic web stuff is that no longer requires the developer to be there and allows people to actually have autonomy on computers but again there's no percentage in that it's in fact anti-profitable and so like that's it's a very difficult thing to organize that kind of not only a technical vision, but social vision as well. Jonny: [1:15:17] Yeah. I always end up just like back in wiki world. It's just like some of the most, some of the most lovely parts of the web, as far as I'm concerned. I'm still curious if I can find this, this like link data music project. Cause that also is something I'm interested in. Oh, so like, I don't know. I feel like the thing I think about is like survivable web technology. Always just like return to like pirate networks being sort of like the things that can exist and do survive on the web we're just like what are the longest lived things on the internet and it's like the w3c website just sort of they win by the hell but like but like, other than that like pirate networks like that is the other major answer that just like some of those like mp3s that were like released on kazaar or something like that are still floating around and that just like you compare that to the extreme adversarial conditions by which the entire global intellectual property regime is bearing down you and still it happens like why does that work and like you know to some degree it's a technological question but it's also a social question of just being like because people take it as their responsibility that it's like i see see myself as an active participant in this system. And so when my pirate site gets shut down, I go to the next one and put everything back up. Jonny: [1:16:43] So, yeah, that's anyway, you've got to love the pirates, although there's a huge amount of power and political problems in those circles as well. Jay: [1:16:51] Librarians need to read that, like how to form an affinity group zine and like go from there, see what happens. Justin: [1:17:00] I mean, I was. Dorothea: [1:17:01] It's likely to work as anything, really. Justin: [1:17:04] Yeah. I think one of the practical reasons also linked up in data is always difficult is that kind of all files are local files in the same way that like all history is local history because it's always local to somewhere. Justin: [1:17:19] Anytime I try and think of, you know, particularly like when you mentioned EADs, there used to be a lot of stuff in the EAD literature about like, why does no one share their local authority files? Like, you know, like John Fox Smith donated to the library and we have his name authority file in like our decks, but he doesn't have like a library of Congress name authority because he wasn't famous enough. Right. So everyone's got there. Right. Right. He just had a bunch of money. Right. And so, so, so we have all of these people who are local in our local name authority files and they never, ever get shared and they always stay siloed. And yes there is almost no solution to it because the amount of labor it would take to like disambiguate the names people who have common names and you know is this the same person and then who's going to do it too because like they barely have enough staff and special collections anyway so who cares if like every local donor is going to get their own name authority file while and like I think another thing is like like Johnny mentioned having like the way Johnny uses a word would have to go to a URI it's kind of when we were talking about taxonomy last week and that episode doesn't come out yet but I, Sort of like the issues with like taxonomy for animals and everything. Justin: [1:18:44] You need like smaller sets of words, not bigger ones in order to actually make it useful for humans. So when I was working with the bird working group, it was like everyone keeps using too many different words. We need to just all we need to solve this problem is like a short list. And then we can use that as like user submitted metadata and tags. And that's really all we need is just to agree between us humans, we're going to use the word paleo-ornithology instead of archaeo-ornithology. And, like, that's all we had to do is, like, kind of get people to agree to that. There's not really, like, a technical solution because, you know, the entire birdworking group of paleo-ornithologists is, like, if they were all on a boat and it sank, there wouldn't be a birdworking group. Justin: [1:19:33] Right. So it's, it's not too difficult to like, it's, it's not an impossible like political solution. And it's what I always keep kind of thinking about is like, we have all these documents. Yeah. And there it's, it would be nice to break things up into data and share it as linked data. But as an organization, you don't really need to depending on the size and scale. And so that's why like so many libraries have their own. When I think of like how a library is organized, it is ultimately you know the reason why mark is like that is its access points and it's kind of what we always default back to is what's the access point for this and i don't really care. Justin: [1:20:16] Semantically like how the data works as long as like this is a subject area this is the title this is the author how do i get to the information like the quickest possible steps and then that And that leads to, I feel like that's where always the disconnect has been for me with linked open data of like, when is this going to help my users in my library? It's like, well, you can get stuff out into the, and it's easy for me as a Skullcom person, because it's like, I'm the only person who's like, no, I want this out everywhere in the world. I want everyone to look at this. But everything else in the library is categorically organized around how do people in here find the stuff that we're looking for? And I'm the only one who has to flip that and try and say, how do we get what's in here out to the world with no barriers and restrictions and logins? Jay: [1:21:07] Yeah like was it last year maybe a couple years ago i was part of the like pcc ad hoc, group that put out the final decision about like hey maybe don't put gender in name authority files, because there was the initial one and then a lot of people got mad at that one and then i was part of the ad hoc hey let's revisit this thank you for your service and one of the final sticking points. Jay: [1:21:38] Like, cause most of us were on board with like, maybe let's just don't like, it's too complicated to think of any ways to like put consistent language ways to do this ethically. That's not going to hurt like trans people was mainly who we were thinking of, but like, there's other reasons why you might put gender. I'm like, some of the reasons were like, but with like Asian names, sometimes it's hard to disambiguate. And I'm like, that's racist. Jay: [1:22:05] Like, that's just lazy and racist. exist but the big one like the final kind of sticking point where we were like maybe there's a point here but ultimately no we don't care was well in a linked data environment people could query books about xyz written by trans authors or for example like you can do a sparkle with wiki data where you can be like pull all of the towns that currently have female mayors or whatever is usually the example that they use when they tell you what sparkle can do with wiki data like what if you could do that with a library catalog whoa and we had to be like yeah but no not discovery layers like primo doesn't even do that yet like no discovery layer right now that's like popularly used by academic or public libraries has that capability they might have linked data in the records and they might have apis exposed if you have a developer who can do neat shit but ultimately that's not how those searches work right now so maybe it is available in the future but for right now we don't care and that's not the purpose of name authority files so right, yeah like. Jonny: [1:23:22] The the question of just like what is it for like what is the point of it you know why Why would I do it if there's no use is like also ultimately really just like, like beliefs about like how things are supposed to be designed. We're just like, is the goal of it to be able to get a exhaustive and true answer of all of the, you know, cities that have a woman as a bear, you know, is that, that the point of what we should be doing with semantic web is to like make the correct information exist in a unified vocabulary. And like, I don't, I like, spoiler alert I don't think so that just like well because there's no such thing as like the authoritative and complete true archive of all knowledge but it's also just like. Jonny: [1:24:09] Thinking about is like well that's like an impressive technical feat that i could put on like some sort of like tech specs document that just like my query engine can produce 10 billion triples in like one one second but like yeah like what's the point of that and just like thinking about it like in the context of language we're just like it's also related to the notion of like ontology curation about just like how do we come to like know the terms that are the one term to use is like that's only an important question if the goal of it is to like make everything be totally uniform and also that that act of searching is like relatively precious and hard to do and like i can only do one of these or something like that that just like this is not an iterative process of exploration and ultimate and also that just like you're not able to so like the thing about just like the way that this works with language we're just like it doesn't ever work with language like say new phenomenon exists in the world like we need to get the council of languages together to agree on the one word for that and then everyone from then on has to agree to only use that word to refer to that phenomenon it's like that never how it has happened and it never will be and just like instead just like this sort of local interpretation of what's happening in my immediate reality and just like you try and use this word and is this effective with it when when I say it in this way. Jonny: [1:25:32] Oh, what I'm talking about is this. And Oh, I know it as this. And just like this sort of negotiation over what things mean and in what context and to who, and like being able to have your personal vocabulary and ontology where just like, as your history of your browsing. Jonny: [1:25:48] It's like, I've come to know that these terms are the same terms or just like when I am in this neighborhood of semantic space, I use this word instead of this word. And like. Jonny: [1:25:59] Then you can imagine like the collective power of something like that. We're just like, okay, all of my friends know these words as being the same. And so just like in general, I can ask around and say who I'm looking for this. Does anyone know how I would refer to that? And just like being able to, you know, make sense of just like as like as like an iterative and a social and an interactive process. Jonny: [1:26:21] Not one that's done once as if it were like a database query with a very controlled database schema that's like known in advance. Ants and so like it just it changes our expectations for what technology should look like that just like i don't go to the vast impersonal search engine that indexes the whole web but instead i have to actively cultivate sort of like a set of nodes and and friends and like relationships and like prior acquaintances with this kind of thing and then expect it to take a little bit of time to find stuff you know that just like that and like i that sounds sort of counterintuitive we're just like i'm not saying it in create exclusion or create inefficiency but like that just like the goal of the system isn't to produce maximally true maximally numerous and maximally cleanly organized data all the time and like it's just like it it's i can imagine like thinking about just like what happens you know just like like just talking about just like why doesn't everybody share their their like local i actually i'm not familiar with this term like authority file i assume that's like you know like a local like reference like subject. Jay: [1:27:37] Headings or like if you publish a book like your name how that's in the library of congress it's an authority file. Jonny: [1:27:43] Gotcha yeah they're just like they're it's also just like one of the things who gets to do that you know that like the same problem with just like, you know, libraries and museums being the sites of just like pillaged cultural artifacts. It's just sort of like not your job and not your role to be the purveyor of this information like it's about this person. And it becomes your role because like they have no means of doing so themselves. Jonny: [1:28:10] Like there's just like these systems aren't ones that can be touched by the average person. Like I can't like deposit a book myself in Library of Congress. I need some intermediary force and so like that's just like that like there's another part just like why doesn't it happen and why doesn't it work is because like on the other end just like who is it for and should we even do that at all because like same thing of just like what happens when you need to change your dead name in in the all the bibliometric records like how does that happen yeah i freak all my software friends out when i talk about eventually needing to write the anti-performance performance manifesto that just like sort of like that just like like and someone who is like a friend on the fediverse and it's like we talk all the time just sort of like horrified just like what do you mean software should be delightful to run and like just like yeah yeah that's not exactly what i'm referring to though just being sort of like that like the we need to get page load time down to two milliseconds or life will be lost and meaningless as we know it as just like a set of ideological commitments rather than making stuff be usable by people is the thing I'm talking about. Oh my god i'm opening this i'm opening this you have an authority file you have an official uri i. Jay: [1:29:33] Do i have a uri and i'm part of the problem. Jonny: [1:29:39] We all have many uris yeah i helped. Jay: [1:29:43] I helped write a book in like 2018 in my during my first job hell. Jonny: [1:29:49] Yeah so like one of the interesting things that i think that blue sky and ad protocol has done is like make it so that like domains are sort of meaningful as identity we're just like yeah that's cool yeah that just like i have a domain and like control over a domain and that gives me a source of identity even if it doesn't give me control over the computers that host the thing that you know whatever like we talk about that different time but just being like it's very interesting that just like that has resurged and actually genuinely useful. And I think one of the best ideas to come out of it is like actually using those, like I, you know, URIs and URLs has just literally, this can be my name. Dorothea: [1:30:31] Yeah. Because it's language independent, human language independent and things like debt naming, which we have to deal with in the authority file environment because it is predicated on names. It's just a URI you don't have to do that you can attach any name you want to it so there's. Jay: [1:30:51] Definitely that's the good thing about URIs is it allows the flexibility for trans names or any other kind of name that might change absolutely that's the good part about them love URIs. Dorothea: [1:31:02] That's one thing that I want to keep at all this nonsense URIs as identifiers was genuinely a clever and useful idea. Jay: [1:31:14] Yeah, it was a big deal when the homosaurus moved from having the terms be the URIs to having alphanumeric URIs so that we could change terms as language use changed. Dorothea: [1:31:25] Yeah, love it. Justin: [1:31:27] Did they ever tell you don't put semantic information into URI? Everyone does it. It's so stupid. Jay: [1:31:35] We're queer. We don't listen. Fuck you. Um doi.org doi.org. Justin: [1:31:46] Slash my journal volume one and. Dorothea: [1:31:50] It's like yeah if you ever if you ever meet jeff builder who's a wonder he works at crossref wonderful human being he has many many many rants about publishers coming to crossref wanting to change a doi prefix because they merged with another publisher or internal change publishers or whatever the hell and he's like no that's not the point. Justin: [1:32:12] They um they have a a suffix generator now it's just it's literally just a spreadsheet that generates a suffix but they're like use this idiots yes. Jay: [1:32:22] Please is that like half your job justin is just being like. Justin: [1:32:26] No i don't meant i mean i don't meant dois manually usually but But the thing that always bugged me was OJS used to put semantic information into the automated strings that it would create. So it would create, it would say like V and then the volume number and then N and then the article number. And I was like, don't do that. Just put random numbers. Just put random numbers. Just general, just random number generator. That's all you need to do. But they didn't do it until the latest update. So now they do it properly. Dorothea: [1:33:01] Where you can do what every single baby relational database administrator knows to do, and just count. Jay: [1:33:10] I don't know how to count. I'm gay, as we've learned from the homosaurus. Justin: [1:33:14] I do have an Excel sheet of... Manuscripts and database bases and it's just zero zero zero zero zero what is this yeah. Jay: [1:33:26] What happens when you go beyond the capacity for how many zeros you picked with them. Justin: [1:33:32] Doesn't matter. Jonny: [1:33:33] Okay and like it's like it's like all of these things like have their times and applications and usages and everything like that we're just like just do all of them and make them all point to you know the same thing different things etc that just like like because i think like you know sequential numbering identify works you know there are times when you don't want to use it like we're just like you have like potentially personally identifying information where you don't want someone to be able to enumerate over all possible things and find all the stuff on the server and spoiler alert is like university it terrible job at this and And frequently we'll just have like very sensitive documents hanging out that can be publicly enumerated on their public web. Jonny: [1:34:19] But like, you know, so it's like super useful when designing some systems in the same way that just like having totally anonymous strings is super useful in like PID space. But then want to have semantic URIs and some other content that just like do all of these things. And like the other one is like the content hashing where just like the identifier is like intrinsically based on the content of the thing. So if I have the thing, I know how it would be called everywhere in the world, like has its own benefits and trade offs. That's like, that is one of those dangerous ideological territories where just like you get pirates and also cryptocurrency zealots in the same room. And it's just sort of like, like, it becomes this maelstrom of just like, the same idea, meaning completely different things to different people. Jonny: [1:35:09] But like, yeah, yeah, we're not going to solve the identification problem, but basically just like, you know, it's the rigidity and being only able to use one thing that like is the problem to me. Justin: [1:35:21] Yeah. Now, I don't have Library of Congress name authority file, though. Someone from Florida with my name born same year as me does, which is confusing. There's so many people with my I went to high school with someone with my name. It's very confusing. It doesn't seem like it should be that common. Jonny: [1:35:39] It makes you harder to dox, though, so that's like passive self-defense. Justin: [1:35:43] It is really good. I have successfully scrubbed my information off the web several times. It's not hard. Or one time I couldn't do it, so I just redirected it to another dude with my name. And so I just changed my information to, I changed my address to his. Jonny: [1:35:59] And I feel like this would be something that just like, like Dorothy would probably have stronger thoughts about, it's like the notion of privacy and like when it comes to like linked open data and stuff like that we're just like this the fact that just like we don't want all the world's information to be publicly we don't want like the justin authority record that includes your you know social security number and you know phone number and everything like that like like like limits to openness you know that just like needs to be some amount of like fungibility and yeah i'll. Dorothea: [1:36:32] Actually give you a real world example if you go and look at my wiki data page and you can just go to wikidata.org and look up dorothea salo i'm the only one as far as i know that has ever existed so what you find will be me i might uh although i identify like i'm c's female that is how i identify that's who i am my wiki data page actually says no gender no gender recorded and the reason for that is that Wikipedia, with which I have a very vexed relationship, runs through wikidata every now and again to do things like make lists of people who maybe should have wikipedia entries but don't and of course they do this for minoritized and underrepresented populations and of course wikipedia is well known for having a huge gender problem gender disparity coverage problem so i get sucked up into those lists and nobody asked me i do not actually want onto wikipedia page thank you very much and i would rather not be so i changed my gender that is listed on wikidana. Justin: [1:37:42] I did not actually change. Dorothea: [1:37:42] My gender that's. Jonny: [1:37:44] Dope like anti-bot action like you just like. Dorothea: [1:37:48] Yeah a digital seem to be the only option for saying no don't make me a wikipedia entry transfer. Sadie: [1:37:57] The privacy of it. Dorothea: [1:37:59] Pretty. Justin: [1:38:00] Much gender opsec. Jay: [1:38:03] My gender is. Justin: [1:38:04] Fuck off, get this gender working for me, yeah no that's why I also like orchid IDs too because it's a very nice system that you get to control and you get to you get to write your name how you want it you can write it in multiple scripts, and it's just an orchid and it just will point to whatever you tell it so you can change it whenever you want and that's what I really like about it is you know that that would be something that would be very nice to use for like local archiving and stuff like that but the reason why is like no one's going to bother to do that nerds will do that but like i couldn't even get like faculty to do it even when this would save them time in the long run or it would make right or it would solve headaches like if they don't if they have a double barrel first name and people keep putting their second first name as their their last name it would solve them this problem but they you know they don't go sign up for an orchid i. Jay: [1:39:02] Was actually when i was cited in the ethics in name authority files book one of the chapters and then they asked like how i wanted to be cited i was like i would like my orchid, because they were citing one of my articles or my thesis or something they had my dead name on it and i was like i want you to do it this way and i want you to have my orchid in there so that it's collocated like properly links back to like all of my stuff right and i think it was brie actually then went on to write an article and talk about, like how i ask to be cited in that book as like using orchids and uris and linked data as a way to help trans people who maybe have published under dead names um and if they don't want to go back and change like ask for it to be changed which i don't but this way i can have people cite me and just use my like first initial and it point back to my current stuff and everything i've done with my current name while also still being like but i'm also the person that wrote that yeah it's not that hard. Justin: [1:40:08] Yeah especially if you like use initials because i use my initial a lot because i do have a very common name so i think but i used to write my full middle name and i don't do that anymore so it's nice to be able to be like okay i published my thesis with my full name but now i only like using my middle initial yeah and now i'm at an institution where i'm the only one of me so i don't even have a number after my name i was very excited when i got my email signed to me because there is now someone else at my university with my name so there is like a zero one now and i'm like ah finally got there first i used to get detention because of some dude had my name are. Jonny: [1:40:44] You serious i. Justin: [1:40:46] Get his detention yeah they used to put out a roll with the names at the beginning of the period teachers had to check them and if you were on the list you had to go to the cafeteria so i kept getting called into the cafeteria because Because it wouldn't disambiguate my name. Sadie: [1:41:00] I had that happen to me too. I had my birth last name, which is, I changed my last name when I got married. My birth last name is Johnson. So there's like, not only are there 70 billion S Johnsons out there, but I have a cousin who has almost the same exact name as, we were born almost the same exact person practically, right? We have the same name, the same first name, same last name. Neither of us use our middle name, right? Yeah. So- I got told I was supposed to go to detention a couple of times in high school because there was another person with my name. It's common. Jonny: [1:41:39] But like that's a, you know, like free bad kid social currency, you know, just like, hell yeah, I'm going to detention, baby. Like that's like, you don't even have to do it. So you get the best of both worlds. Justin: [1:41:52] Well, I used that. What they said to me was, well, if they don't put your middle initial, it's not you. And I use that excuse for the next four years, even though that dude was a senior when I was a freshman. Jonny: [1:42:01] Said no middle initial. Justin: [1:42:03] It's not me. Jonny: [1:42:03] It's like can't. Sadie: [1:42:05] Make me do it. Jonny: [1:42:06] That's like just social engineering you know just in the real world you know just people just intuitively do it there. Sadie: [1:42:13] Is no difference between social engineering and con artistry. Jonny: [1:42:18] Hell yeah yeah. Sadie: [1:42:19] I will die i will die on that hill. Jonny: [1:42:22] Yeah a good friend of mine is having a crisis of like direction in life and i'm like okay so your strengths you are super good at like infiltrating unfriendly organizations and groups of people and like taking on roles and shit and did you know that that is a job and like um and so like trying to like yeah turn this person totally a job like it's like and a lot of the people that do it sort of accidentally find themselves you know like like you know seeing it the first time like holy shit you can do that and then just like suddenly becoming really good at it anyway i. Sadie: [1:43:01] Feel like the the alternate of that fork is improv comedian. Jonny: [1:43:10] Their their. Jay: [1:43:11] True their true destiny is they just become podcasters improv people are good at doing podcasts like all my favorite podcasts i've learned like the people did improv i have no idea what. Sadie: [1:43:22] I'm doing here. Jay: [1:43:23] Yeah. Jonny: [1:43:24] That's like something we did improv that one episode what you did like improv games or like what what are you talking about we. Justin: [1:43:33] Had we'd seriously wrong on we did skits and those. Jonny: [1:43:36] Were oh yeah i i. Justin: [1:43:39] Dipped i was bad at it we. Jonny: [1:43:43] Were very. Justin: [1:43:43] Bad at it but they very good at editing. Jay: [1:43:45] They're so good at editing my god when i finally listened to the episode i I was like, oh, wow, they made something out of this. Yes. Justin: [1:43:54] But, yeah, the only thing that we didn't mention that I wanted to maybe mention is kind of what we talked about last time was whoever controls the nodes of a graph can control the graph. And so I was also thinking about that as a security problem with linked open data is, you know, when we were talking about like all of the privatization happening, if someone buys a certain node of the graph, then the same problem Sadie was saying with everyone having their own API is like, if you're controlling this graph, even though it's open, and you control like the right permissions, then like, I don't know, assume that's a problem that's going on. Because oclc has meridian now and i assume that that it only exists because it will make money if. Jonny: [1:44:47] You control. Jay: [1:44:47] The spice you control the universe. Jonny: [1:44:48] Yeah is that a animal this is a. Justin: [1:44:53] Very cranky. Jonny: [1:44:55] And just like desirous animal it's like my turn like i'm sure i haven't heard about this this meridian thing was the first time i heard about this today is this just like a it says may 2024 is it like i assume it's is it that new i. Dorothea: [1:45:11] Hadn't known about it until today either for when it's worth oh. Jay: [1:45:14] Clc just loves to do shit. Justin: [1:45:16] Our our metadata librarian is is currently work like on on a at my job is on like a committee for i think what is what is the organization the program for cooperative cataloging and they're They're working on a task group for like URIs in Mark implementation. So I guess like they're going to have separate types of like handle based permalinks or something. I don't know that are going to be in Mark, but they were also talking about how they had like a demonstration of Meridian. And I don't, I think it's just the link data they've made out of WorldCat. Jonny: [1:45:55] So they're, they're, they're using an entry for Octavia Butler as the demo data. And I'm like, that's like an interesting, interesting, like person and body of work to evoke in your like corporate platform. Like that's just like, yeah. Justin: [1:46:14] The don't build this machine. Jonny: [1:46:16] Yeah. Sadie: [1:46:17] The Torment Nexus. Dorothea: [1:46:19] Thank you. Don't create the Torment Nexus. Sadie: [1:46:24] Wouldn't it be terrible if we created the Torment Nexus? Creates the Torment Nexus anyways. Dorothea: [1:46:30] So here's a gif. And this is totally off the cuff just because, again, I only heard about this today. I think it is clear to OCLC that their WorldCat monopoly is not long for this world. one way or another. Whether it's a customer revolt or we finally find a way to do this with linked data without getting sued out of existence, that's not going to last. So how can OCLC come up with a linked data store that they can fence around, limit to their customers the same way that they've done with WorldCat? That's what I think Meridian is. Justin: [1:47:11] Probably. Jonny: [1:47:12] Probably i mean as as you're saying like they're doing it because it makes money somehow and like i think that's a pretty good bet i mean and it's like continuous with the way that the rest of like linked open data has has worked we're just like that's like what wiki data is to to some degree is that it's like basically a captive labor pool like and so it's like like who funds wiki data is largely Google. And so like Google bought Freebase, like the predecessor to it, you know, they did their attempts at cleaning it up and everything like that. And then basically like shunted that into Wikidata and they profit from it immensely by being clean, corporate friendly. Like there's no like swearing on Wikidata, you know, and, and. Jonny: [1:48:04] Way of concentrating a bunch of labor so that then they can mine it and make derivative profits from it and like we're just like the people that work on wiki data are like genuinely true believers in like the beneficence of cataloging the world's data they're just like they're like not corporate stooges they're like view themselves as being like we're just trying to do the same mission as wikipedia which is just like yeah make make a global information store but not really evaluating the like why would google want us to do this you know and like and so just like that that sort of pure production as captive labor model is one of those biggest sort of like you know red pilling moments for like information people is that just like what if it's actually bad to have like these sort of like crowdsourced information platforms that just like so when we were watching when we were watching lo and behold like one of the like examples of just like the beauty of the internet and so it's like again like every time i think about this is like this is a movie that was released in 2016 which is not that long ago but yet and yet it feels like a completely different universe we're just like this is like one of. Jonny: [1:49:21] The promising things about it where you had this like chemical reaction crowdsourced thing where just like the wisdom of the crowds lots of people playing this game about like protein folding or whatever was able to do something that you know the best scientists in the world could do and it's just like cool but were any of those people on the paper that got published from that and from all of that work and like we're just like If it's just a thing where you farm out other people's labor in time. Jonny: [1:49:53] Or just in this case, farm out all of the cataloging labor that happens in libraries into curating this... Collection of information in the same way that i i don't know the politics of world cat i assume it's the similar kind of way we're just like everyone is required to use this but we don't actually have much control over it kind of thing and just like yeah like that is a a massive extraction vector sort of hiding in plain sight under the guise of pro-social technologies. Justin: [1:50:22] Yeah and this is probably more of the same which is to make that data then usable and and useful to ai products i would assume particularly it's interesting that they mentioned like incorporating orchid and ror which are like skullcom specific things really especially ror is like a weird one to throw in there because that's like research organizations right to make sure that those are disambiguated because journals are really really bad at disambiguating like the biology department of this university because departments change all the time and also people abbreviate them and And, you know, so there's no, there's no like one identity and that causes all kinds of problems, even just like getting the university right half the time. It's like, it's wrong. So ROR is kind of like orchid for organizations. And so that's a very specific thing. And I find that very strange. Like, do they want like regular, like cataloging librarians, like fix the Skollcom metadata thing? Problems that are out there they. Dorothea: [1:51:26] Do like oyster yeah. Justin: [1:51:28] That like clarivate. Dorothea: [1:51:29] Can fix scoop that up back in the day what's that oh it was a union search engine for institutional and sometimes disciplinary repositories is is what it was it's basically there were always problems with But the problems go back to OAIPMH being complete garbage, such that you couldn't, for one of the things it does not allow you to say is, is there a full text associated with this item? And so one of the reasons Oyster became completely useless is that it was choked with metadata-only records, which really disappointed end users because they couldn't click on it and get to the thing. Right. Jonny: [1:52:15] And that's definitely why I auto-embed Sci-Hub links in all of my writing, because it's just like, what use is it to someone else for me to cite something if they can't actually see it? Justin: [1:52:28] I wonder how they scrape the full text information now when stuff gets pulled from OAIPMH, because it still does. Because OAIPMH is how we push out to core, but it definitely does know if we've got full text. Dorothea: [1:52:40] I have to think they implemented a check, which is fascinating because they would have had to implement such a check for pretty much every single repository and repository design in existence. Like, you're literally looking for a link that says PDF or something. Justin: [1:52:59] Yeah. Dorothea: [1:53:00] Wow. All because Herbert Van de Soppel is complete crap at building protocols and things that will be useful at last. All right. I said the name. Jonny: [1:53:10] This is obscure beef. Dorothea: [1:53:13] Oh, I, you know, Herbert Vandesop, when I say serial project abandoner, he is the paradigm example. He totally did that with OAPMH. He totally did it with Memento. There are probably six other projects of his that I could also... Right? Memento. Justin: [1:53:33] Remember Memento? Dorothea: [1:53:36] Yeah. And I'm just like, funders, stop giving this guy money. It never turns out well. Justin: [1:53:43] We got more obscure beef than a wagyu farm heck yeah don't look at me like that. Jay: [1:53:53] I'll look at you however I want to. Justin: [1:53:55] Alright I was very proud of that. Sadie: [1:53:59] It's good. Justin: [1:54:01] Well done thank you I think we should wrap up. Jonny: [1:54:06] Yeah yes I've got sleepy bitch disease. Sadie: [1:54:11] Did we clarify what the hell's. Jonny: [1:54:14] Going on or still cloudy. Sadie: [1:54:15] I i think i've got a pretty good gist actually and you know what knowing the beef actually helps it it it does so good that's. Dorothea: [1:54:29] Like and you know i do teach this stuff sadie you know my email address you can totally ask me questions. Sadie: [1:54:34] That's true yeah That's true. Jonny: [1:54:38] And like, like one of the things I have come to love in this world, you know, the few things that you can love in it. It's just like, every time you get close to something, like you just like realize that it's all just people. And that's just like all these things that are these immutable features of the world. One day you might just come face to face with like, Oh, that was you. And then just be able to be just like, like that just like yeah all of a sudden it makes sense where it's like i get why it is that way that just like you know you knowing the beef and knowing the people is the way to know the thing yep. Sadie: [1:55:18] It all makes sense now. Dorothea: [1:55:23] Oh glad to hear it thanks y'all i as always love being on the on the podcast. Justin: [1:55:28] Yeah oh thank you so so much for coming on yeah thanks and i'm glad we got to do this. Jonny: [1:55:34] Yep yes good to see you yet again let's let's find time to watch a movie sometime soon it's been a while yes. Justin: [1:55:42] Oh yeah i need to do more i need to do more movies in the in the discord which i was about to plug because dorothea you've also been answering questions in the discord it's very helpful yes and we appreciate it it's. Jay: [1:55:53] Just us shit posting and you being helpful yeah. Dorothea: [1:55:56] Well i mean you know that's and worse the way it usually is. Everybody else is being helpful, and I'm shit-missing. So, hey! Sadie: [1:56:04] Even the score. Justin: [1:56:07] Good night.
Editor is loading...