Justin:
[0:26] Hi, I'm Justin. I'm a Scalcom librarian. My pronouns are he and they.
Sadie:
[0:30] I'm Sadie. I work IT at a public library, and my pronouns are they, them.
Jay:
[0:34] I'm Jay. I'm a Catalonian librarian, and my pronouns are he, him.
Justin:
[0:37] And we have a guest. Would you like to introduce yourself?
Mike:
[0:40] Yeah, hey, I'm Mike. I'm an open scholarship and publishing librarian, and my pronouns are he, him.
Justin:
[0:45] Welcome. Thanks for coming on. I just kind of shot you a message out of the blue and was like, hey, you talk about really cool stuff in the secret underground IRC chats that we frequent. Why don't you come on and talk about some of that stuff on the podcast yeah.
Mike:
[0:59] I i'm excited about that i love to talk and to do so more than more than is probably appropriate often so i i'm super into it.
Justin:
[1:07] No sounds good yeah so the way we know each other is we're both in these secret underground, discords that are really super hard to find i bet and we talk about like scholarly communication stuff so it's all different types of people across the gamut of scholarly communications and like the metadata of it the ai incursions the way it makes all of us a little bit crazy the leftist politics that kind of you more or less have to adapt and if you don't you end up as a weird kind of person who writes a blog called skitchen i.
Jay:
[1:40] Used to work with that person by the way.
Justin:
[1:43] She's not the worst no.
Jay:
[1:45] Not the other one too i don't know yeah oh no.
Justin:
[1:52] So then you're like you know it's really important that publishers get their way on everything it's like oh okay did.
Mike:
[1:58] You know that they're our friends.
Justin:
[1:59] Oh the publishers i thought you're talking about the the the chefs i was like no most mostly they just follow follow us on on social media and steal our ideas yes.
Mike:
[2:10] This is true.
Justin:
[2:11] For the blog imagine what they'd be about if they were in the discord though they would be like what what is the current state of when you nut and she keeps sucking it's like yeah well i.
Mike:
[2:23] Think they would really struggle i always tell people that i'm on a i'm on a discord for terminally online online librarians and it's primarily shit posting so i would you know we don't we we do there are there's a lot of issues we discussed but it is predominantly a place for shit.
Justin:
[2:39] Yeah i love it it's it's really what's kept Miguel and when Twitter started to die and you know, a lot of cool people left and some people who aren't on that discord. And I was really sad. So I'm really, I'm really happy people are finding their way back to blue sky and seeing some faces I haven't seen in a long time and hopefully get them back on the podcast. Cause I really was not in touch with them outside of social media.
Mike:
[3:00] It's been a very slow moving diaspora, just sort of leaving one terrible place.
Justin:
[3:05] Yeah, that's true. So everyone knows us, but Mark, why don't you introduce yourself and tell us like as much information as you want about yourself personally, but also like what you do.
Mike:
[3:14] Right. I'm the open scholarship and publishing librarian at the University of New Brunswick, which is like way up in the East Coast. It's sort of just easterly of Maine. And I started there maybe 2013, 2012. I'd spent years before that working in sort of a digital publishing part of the library. I was specifically using Open Journal Systems by PKP, which we'll talk about here in a while.
Mike:
[3:38] And eventually, I just hit this point where it was like, I'm doing the work of a librarian, but I'm not being paid like one. It'd be really good if I just went and got the letters and got through it. I was lucky enough to end up, you know, getting a job at the place that I had worked prior, which is also more or less in my hometown. Then I could step into a role where we really didn't have anybody doing skull comms before. And I could really sort of make it my own position. At the same time, I started working for PKP in probably, geez, 2012, 2013. While I was in library school, actually, I went to Western in Ontario. And I needed I needed money. So I had my like third student loan, my like third degree. And I thought, well, I should probably have a job while I'm here. And I started working part time for PPP. And I just kind of stayed on helping them with like, hosting support for the journals they host and development stuff and all these other pieces. So I do both of those things. And it's nice to it's nice to have a job where like, I have a side hustle that just sort of dovetails perfectly into my regular work as a librarian. It's pretty advantageous. And it's pretty good. So those are the broad strokes, I think.
Justin:
[4:45] So are you like officially an employee of PKP?
Mike:
[4:48] I'm a contractor. Technically, there are a lot of Canadian librarians in particular who end up doing contract work with PKP where you work maybe, you know, 30 hours a month or whatever. And then you just sort of bill what you work and that's it.
Justin:
[5:01] So how did you get into this kind of work, like specifically? Because I know we have a lot of graduate students. So if someone wanted to do what you do? Like what areas should they be looking into? Because I know this is all circuitous, like there's lots of different ways to get into this kind of work. But how did it work out for you?
Mike:
[5:18] So I had a friend in the history department where I'd been working and he was doing like a digital humanities project for a professor who had gotten a big like Canada Research Chair grant. And he was just writing XML. He was like doing XML markup of these specific documents like TEI. And at one point he said to me, hey, do you know any XML? And I said, no, but it can't be that hard. I know HTML. And he was like, great, come interview for this position. And I applied. We spent most of the interview talking about David Bowie. I got the job in this very loose place with lax workplace vibes. And then I said.
Justin:
[5:54] You know, you got a job is when the interview is entirely about David Bowie. It's like, oh, yeah, I got this.
Mike:
[5:58] Yeah, 100%. It was very clear to me that everything was going to work out. But the job was basically taking all of these journals at the institution that the library had hosted for years and moving them into open journal systems and at the same time doing XML typesetting. And so I was learning XML as I went. And on top of that, I was learning a French schema. There's this consortium in Quebec called Edudy, and they're part of this big organization called Coalition Publica. And the idea is they're basically a platform for pulling together all these Canadian open access publications. So I learned XML with the French schema.
Mike:
[6:35] And then I just met a bunch of people I liked and kept working with the software I liked, met people from PKP as a result, because we were sort of doing a little bit of co-development work. And then before I knew it, I was like invested in scholarly publishing work on that side. And I knew that it was scholarly communications work, but I didn't really know what else to do. And when I went to do my library science degree, I kind of had to explain open access and open source software to faculty, which is pretty fun.
Mike:
[7:02] Someone's like, tell me a little more about this open access. It's like, you're a professor at a library? Like, what is happening? So yeah, there weren't a lot of people stepping into Skull comms, really, I think at the time. Now I would assume there are a lot more people stepping into that space than when I started. But yeah, it just sort of worked out that I was doing work that I liked and knew how to do. It was technical work I was capable of doing. And it helped that I'd spent time working in a library in a technical role before I went to library school, because that meant when I came out, I had chops, and I'd worked in an academic library, and I understood a bit about the setting. And maybe the worst thing about wanting to work in academic libraries is sometimes the requirement is that you have worked in one before. But when you have the letters, people are hesitant to hire you for something that isn't a library degree. So they say like, well, you don't have an M-less or you have an M-less, so we can't hire you for this lower position. You're a flight risk or whatever. And they're less likely to hire you. So it's really hard to get experience if you didn't already have it before you started your position. So it kind of worked out that way. And there are lots of these jobs available at libraries. They're always looking for student assistants who have done this work. And you can, you know, you can say like, well, I did a little bit of markup or I did a little bit of like web support stuff or we did scanning and OCR or whatever else. And I think that's usually a good a good foot in the door for getting the experience before you go off to make something official.
Jay:
[8:20] Completely unrelated. PKP. I had to like refresh my brain because I kept being like, isn't that the thing in Rojava? But that's the PKK, the Kronosan Workers Party. But I was like, I got my acronyms.
Mike:
[8:33] Yeah, that's legit. P.K.P. also shares an acronym with the Polish Railway. So that also happens a lot.
Jay:
[8:40] And I think the Communist Party of the Philippines, too.
Mike:
[8:43] Yeah, sure. I mean, we're comrades, all of us. Yeah, so P.K.P. is interesting. They're the public knowledge project. They've existed since like 2000, 2001, sort of a result of the Budapest Open Access Initiative and all of these other pieces. And the idea is that they make software that allows scholars to kind of take back the means of production, I guess you run independent journals, or you take your for profit journals, or you take your diamond OA journals or whatever, and then you put the tools to run those in the hands of the scholars who run them. And they've worked for a really long time for an international audience. There's a huge volume of OJS users in South America. There's a incredibly hugely growing group in Indonesia, just all over the world. It's really great. And it means that we also have a lot of like multilingual support and a really diverse group of users. The OJS community is enormous. And then PKP started making stuff like open conference systems, which is now dead, but was cool for a while. So if you wanted to run a conference, you could do your paper submissions or whatever. Open Monograph Press, OMP, and there's a preprint system, open preprint systems. But all of these software stacks have the same sort of goal, which is tools that get this stuff out of the hands of for-profit companies, basically.
Justin:
[9:59] Before you got into doing that first job where you were working with XML, were you already interested in open software? Was that one of the ways you understood conceptually how openness worked?
Mike:
[10:12] Actually, I think OJS was my start with, really, in a big way. But because I'm an elder millennial, I'd spent much of my youth just sort of breaking computers to learn how to use them. So I'm very familiar with spending time with software, not really knowing how it works, but wanting to learn by playing around, sort of autodidact, I guess. And that meant I was comfortable using the software when other people weren't. And also when it didn't have a lot of, say, rich documentation or specific user support or any of that other stuff, which I think led to comfort with the software and led to a natural position with a bunch of people just trying to figure out how stuff works.
Justin:
[10:48] Yeah and on that note i don't know if i should bring that up later yeah let's do it now when it comes to like tooling because you were saying like your first job was working with xml directly to do layout what's what are some of the biggest tooling challenges that you run into like on a day-to-day basis especially with like pkp because you know people always need new tools to make things work they can't do everything in xml or Yeah, by hand.
Mike:
[11:16] Right. Well, at the time, the worst part was just that if you wanted an XML editor, they were all crazy expensive, right? I think we used XML Spy, and there was also Oxygen, and a couple of these other systems that were like, I wouldn't say prohibitively expensive for a university, but obnoxiously expensive for what they were. And we weren't using, at the time, there wasn't like Sublime Text, or Atom, or Visual Code, or any of these things. So we didn't really spend any time using much more than these. So PKP runs almost entirely on open source software also, and PKP software is entirely open source. Every now and then we'll have to pay for something like, you know, maybe like, you know, testing software like Travis or something. There's something we may need to buy that is helping with QA or other pieces. We might do it. We use Notion internally, for example, just because it's always been a nightmare to try to manage where the documents go.
Mike:
[12:07] We use one for client support, but mostly we use open as much as possible. And the community keeps us accountable. We have this documentation hub that I linked in our notes here. And it's hosted entirely using Jekyll in a GitHub page. And I remember when we launched it, someone was like, it doesn't have search. And we just didn't, at the time, we didn't include it. And sort of retrofitting search didn't work. So we put, as a stopgap, just a quick Google search on it. And there was like an outcry on the form of like, why would you use a Google product?
Mike:
[12:37] It's like people are actively angry. So it's nice that we've got a community that holds us to account, I think, when we're using software that they don't support or they're not willing to bend on. So we're going to move the DocsHub here, I think, before long. I think we're going to migrate to Hugo. And ideally, we're not going to have to worry about stapling a Google search to it. But people were real mad. So that's part of the objective, too, is making sure that, you know, at PKP, making sure that the software that we're using, we're not pricing anybody out. We're using as much open source stuff as we can. And our developers are all over the world. So everybody uses what they're comfortable using so long as it's compatible, both that everybody else is using. But we've got a developer, a bunch of folks in Brazil. We've got a couple of folks in BC. We've got someone in Russia. We've got someone in Ukraine. This one guy in Ukraine was working on a new email module while Keef was being bombed. He would be like, I'm really sorry I was late for the meeting. And we're like, you're under attack. What are you talking about? But just this hugely dedicated, the group of people who work on, you know, whatever they're comfortable working on.
Justin:
[13:40] No, that's great. I guess sticking with like the, we have like these objectives in mind and we have these values in mind. How does like open infrastructure as a whole play into it? Like what's PKP's role in open infrastructure? Like what other organizations are they working with? How are they getting open out there in different ways?
Mike:
[14:00] Right. So PKP, I think we've signed the POSI standards, the principles of open scholarly infrastructure. I've argued for years that OJS is open scholarly infrastructure. It's a publishing platform for many journals that pushes to a number of other places. In terms of organizations that we've done co-sponsorship with, Crossref has always been very friendly to PKP. They have co-sponsored a lot of the work on their plugins and other things. So they've been really handy. We're currently working with Orchid on some Orchid plugin stuff. There's an open air plugin that we wrote. Portico, lots of other places that are sort of about pushing and pulling metadata. And typically what happens is like somebody wants a plugin. A good example of this would be one time I was asked to be the sole representative in a meeting with Authenticate about the new version of their plugin. And they said, you should really make us a plugin. And I said, you're a multi million dollar company that is a for profit. You should pay us to make you a plugin.
Mike:
[14:59] They kind of understood. But if I hadn't have pushed back at them, they might have just asked to do it. So sometimes we'll do co-development with people where they pay for the sponsor development or whatever. And it's not really a platform we necessarily endorse, but they've paid for the plugin to exist in the software so they can meet the needs of their members or whatever. But a lot of the time what ends up happening is a plugin is developed by somebody in the community. And it's used by a lot of people who use the platform. And then we adopt it and we maintain it. Maybe somebody retires, for example, and there's a plugin where someone switches jobs and a plugin they maintain just goes unmaintained. And so we adopt it and pull it into our repo where we write it into core. So it's just part of the software. So it's a real kind of push-pull between all of these organizations. You know, there's a data site plugin. There was a Medra plugin. Nobody ever talks about Medra, but we had a Medra plugin. There's a lot of these. So if there's some interaction to push or pull metadata out of a publishing system, we've probably had at least a conversation. We even, I would describe this as having had a reasonably functional relationship with Google Scholar for a while, where we could get into meetings with them and have conversations about how the content was being indexed. You know, to me, that was always kind of surprising.
Justin:
[16:07] Yeah, because for a while, I think anything hosted on OJS was basically went straight into Google Scholar. You didn't really have to do anything else.
Mike:
[16:16] Yeah, it's all of the meta tag stuff was kind of automatic. And you didn't need to stress too much about it. But whenever there was a problem, we could talk to the folks who were at Google Scholar. It was Monica Weston at the time, who's no longer there. But she used to be really great. She would meet with us basically right away. And we would try to solve problems collaboratively. I can't say that's always been our relationship with Google Scholar. But for a while, it was our relationship with Google Scholar.
Justin:
[16:39] Speaking of like Google Scholar and open infrastructure, what's like the relationship with like discovery and open infrastructure? Because like, you know, we've got OpenAlex now, which is took the Microsoft academic graph, was taken over by the people who made the Unpaywall browser extension. And they're trying to basically build a free version of some of the most expensive software a library can buy, which is like Scopus and Web of Science and trying and trying to make it work, which is they're doing a great job. But how how's the field of that in terms of well just from where you're at.
Mike:
[17:14] Yeah we're actually really lucky so someone who gets talked about a lot in our circles is Juan Pablo Alperin he runs Alcom Lab in Canada and he's also one of the directors at PKP I've known Juan since he wasn't even done grad school but he's in this position to sort of push a lot of these things and always wants to work with folks so he's doing a lot of work with OpenAlex these days and one of the things that Juan had kind of identified and he's been pushing for for the last couple of years is this idea that in order to be in OpenAlex, you have to have DOIs, but not every journal has them, or they might be cost prohibitive or whatever else, technical overhead, or they don't have someone to update, or they're running an old version of OJS and they can't be in it. So Juan was working on this. So PKP has this data collection thing called the Beacon. So if you install OJS, it sort of pulls basic metadata. So we know how many installs there are. We know how many institutions are hosting, we know how many journals, how many articles are published in OJS. We're not getting like big user stats, but we're getting these sort of broad picture of what OJS looks. And Juan realized that like, if we just had everybody with OJS and OpenAlex, we wouldn't need to worry about DOIs at all. And I think they're just gonna do it. They're just gonna make a plugin, send the data. They can send beacon data now, but maybe make a plugin that just does this so that OpenAlex, anybody using OJS is just getting their stuff pushed to OpenAlex, which I think would be great. Super great.
Jay:
[18:39] Speaking of discovery, in one of my previous jobs, I was a metadata and discovery strategy librarian. And, Part of that was that I maintained our PrimaVie discovery layer. And also part of that was we moved our special collections, like digital collections, into our institutional repository.
Jay:
[19:05] And we had B-Press, which I don't know if you've worked with B-Press. It has the most annoying metadata in the history of the world, which I think its metadata works well for journals, right? It's very journal-focused metadata. data it's about affiliation it's about like article level it's very easy to fill out this exact excel template that they give you and then they have you know a crosswalk to like Dublin Core for that but like it's meant for this one specific thing and so anything that deviated from that but still had to go in that system and then making that discoverable not just within Bpress but within Primo was like a nightmare.
Jay:
[19:49] A complete nightmare. And then imagine telling Google Scholar, hey, we want to make this discoverable. It's like, well, how do you separate, you know, like faculty publications from like this historic dress collection we put in there because it was because Fedora 3 would break if you looked at it funny. And so we just had to put it in B press, right like it was like so hard to get all of these systems just to fucking talk to each other let alone the fact that like different journals have different ideas of what metadata is important and what taxonomies they use and what those standards are oh it's the most annoying goddamn thing in the world it's so annoying and both like yay metadata anarchy and also use the same fucking thesaurus like like yeah ideally it's it's so annoying yeah.
Mike:
[20:42] That it's really funny because the B press situation in the States is so interesting. Like I was, I was in some of the early next generation library publishing calls that California digital libraries was doing. And they basically, they wanted to read, they were like, okay, well, we've got OJS for journals, and we've got a repository, and we need to make sure all this stuff is searchable. And I remember sitting in the call, I'm thinking, because at our institution, we had D space, and we had OJS, like these things have always been separate. And at most institutions in Canada, they've been typically separate. There are a couple of B press schools, but not many. And it was very clear to me when everybody was talking about their use cases that I was like, oh, you're just all very damaged by the loss of BPress. Like this entire thing is a coping mechanism for getting BPress back.
Jay:
[21:22] Can you sense to me?
Mike:
[21:24] I was like, just use OJS and just have a repository. And they're like, we're going to make this whole digital front end for all of our digital collections. And I was like, don't you just have an ILS? Like, what are we doing here? Like, what is this? They're like, well, you get all these six things. Yeah, but we can have a stats dashboard. And I'm like, that's okay. But, you know, have you considered that maybe there are just other solutions than just like open source b press like maybe maybe there's something you can do yeah i the the metadata thing for journals is always so funny because like i for a long time i gave all these talks on like metadata fidelity and metadata hygiene and all the stuff you need to know about metadata and the stuff that's important and it is important right like you want people to make sure that they're not feeding trash into this like shotgun blast of metadata across the internet but the other problem is that like people are fallible and metadata is also just bad like as soon as i as soon as I could kind of just embrace like what if metadata is supposed to be a little bad maybe it's like loving an ugly dog it's like it's okay it's just how you're supposed to be I think yeah.
Jay:
[22:27] My my attitudes on like metadata being just like good fucking enough.
Mike:
[22:33] Changed real.
Jay:
[22:34] Quick at my current job because like as large as our collection is and how much we put through and just how hard it would be to clean everything up because things come from like six different vendors it's like as long as someone can fucking find it like.
Mike:
[22:48] It's fine like do.
Jay:
[22:50] You have a wrong indicator somewhere i don't fucking care just it's fine.
Mike:
[22:54] Just move on yeah they always i've played in balance for years we always have like when you're just a little out of tune there's like a threshold for like close enough for rock and roll this'll do yeah i'm not 100 in tune but it's enough yeah exactly yeah yeah.
Justin:
[23:07] And it's weird that those problems don't go away when you do have b press because b press wants you to like create journals in it. Right. But it's also an institutional repository where you can put like preprints, postprints, things that are, you know, we're kind of doing this to get around the fact that open access isn't ubiquitous. So we do all these things where we go, hey, make sure you give us this exact version of your paper because we can't have the version after it. And this is our one weird trick so that we can get it to be open access.
Justin:
[23:37] And the thing is, you know, it would probably be better just to run our journals in bpress but we have been running them in ojs as well because we've had that for a while too and we also use it for our digital collections because you know part of what i did was just consolidate all of these services and go okay we're going to make these things much more discoverable they're not going to look pretty it's going to be an institutional repository yeah it's a it's a file system basically it's and you know we can we we've done a lot to make it look good but some people go too far and try and like make it do things that it just can't do and instead of accepting the limitations going you know because once you accept the limitation like b press can't do that then you finally have to go oh well maybe if we want it to look pretty let's build a website somewhere else make the website look pretty that's going to be the front door and then we can link stuff back to b press so that way we don't have to do some absolutely wild mixture of collections, sub-collections.
Justin:
[24:40] Buttons that take you to different things, duplicating things. Me and my scholarly communications librarian, so I supervise a Schalcom librarian, and we have different views on this. I'm very much like, it's a file system. It's not going to look pretty. It's never going to look pretty. And he's very much like, well, what if someone was browsing through BPress? I'm like, who? What psycho is browsing through BPress?
Jay:
[25:03] Yeah.
Justin:
[25:04] Because what happens is faculty look at it, look at the site, and they go, oh, well, couldn't you navigate this way? And I go, none of your traffic is going to come from someone doing that. It's going to come from two places, Google and Google Scholar. That's how people are going to find it. They're never going to see our repository. Yeah, or maybe on Paywall. Yeah, on Paywall we'll take them straight to the PDF. You will never see our repository. So there's no point making it like try and do things. It's fine to make it look pretty, but trying to make it do things that a website does is just, it's not what this tool is for.
Mike:
[25:39] Yeah, we had this. So we used to run a repository system called Islandora. And Islandora was made in Prince Edward Island. And it was like a Drupal front end was middleware. It was a Drupal front end and a Fedora back end. And Islandora was kind of like a jack of all trades. And we were the institution that actually paid for their scholar module, which was their their IR module to be ported to Drupal 7. And basically, every time we tried to get Islandora to do something that we thought made sense, we could only kind of do it. But because we were a Drupal shop, and a bunch of our other sites were Drupal, we kept trying to fix these problems and contribute back to the community. And it was always an enormous nightmare. And when we knew that the writing was on the wall for Islandora 7 going EOL, and at the time there wasn't an exit strategy, we decided to move back to DSpace. And our developers were like, well, you know, we're not a Java shop. We can't, moving back to DSpace, we can't fix or customize or do whatever. And I was like, yeah, that's great. We can just say no to people. It's awesome. Can this do it?
Jay:
[26:35] No.
Mike:
[26:35] Nor should it. We want it to be as close to stock as possible so we can update it. So we don't run into this problem where you can't update something ever because your customization to make a thesis graduation year appear on a landing page broke your update. Like it's not, none of that stuff is good. So I've been banging this drum for years. Like nobody whose opinion you should care about is someone who's going to, you know, ignore the people who tell you that you're, you need to browse a repository. There's just some busy body who goes, I wish my collection looked this way. It's like, you're the only.
Mike:
[27:06] You're the one looking at this because you want to know how your work looks. You're the one Googling your own name. And we're under no obligation to make sure it works. Repositories don't need to be sexy. They just need to be functional. And I think... In particular, I feel like kind of like trauma about this, weirdly. Like I started my career when everybody was like doing big presentations about like, here's our new repository. We ran our own. We're on Hydra. We're on whatever other system. Here's this sexy thing we built ourselves. And then like three years later, they'd have to dump it. And at one point I said like, I don't want the repository to be a project. I just want the repository to work so I can do my fucking job. Like, just let me help people upload files. That's all I needed to do. like this none of this stuff is important all this branding can go all these extra functions just like standard metadata that goes where it needs to go and it works i often say dspace is like the danny devito of repositories it's like it's stalwart and everybody knows it's got a low center of gravity it's hard to knock down this is straightforward like please let's just use something simple yeah absolutely it's got an egg for you in these trying times it's just like it's not it's like it's just i you know there's just no reason to do all of this extra stuff so yeah i i hugely supportive of that and and i the beauty of just saying no to people like no we just can't that's just not something we can do not and i'm not even going to explain it i don't need to software doesn't do it it's just so liberating yeah not.
Jay:
[28:27] The stars not today.
Mike:
[28:28] Well i think for a very long time a lot of librarians wanted to prove their worth like i think the skull comms pieces you would go to faculty and you wanted the faculty to take you seriously and you wanted them to like you. So they would come back. And so you would say yes, because you desperately needed them to engage with what the library was doing. And I think that part of the work is, I want to say it's totally over, but I'm already doing valuable work. I don't need to prove to them that I know what I'm doing anymore. I don't need to bend over backwards to appease these people. I'm just doing my job and that's enough.
Justin:
[28:58] Yeah. I think the real trick is to keep them from ever seeing the repository. So that's why we offer mediated upload. So what we do is we send an email. we send an email and says hey we saw that you just published since we saw you just published you might still have this version of the paper just email that to us we'll put it in the repository i'm sure we've got tons of faculty whose work we have in the repository who've never seen what the repository looks like and that's good yeah that's what we want it to do now when we migrated like a bunch of stuff that was living on a website because the website again website hosting is like you know it's unstable there were a lot of these like oral histories these videos these documentaries. We'll put them all into B-Press. And then what you'll do is reorganize that crazy website you had, and then just link it to the video. And people will go directly to it. They'll see as little of the repository as possible. Yeah. And I think that's kind of what it needs to be. Although I do, in my heart, still want haiku or something. I do want to have that and I want to play with it.
Mike:
[30:02] Right.
Justin:
[30:02] But I know, and in fact, this is the next thing I was going to ask you is consortium models. So my university is part of Texas Digital Libraries. They host all of the D-spaces. They host all of the OJSs. And like you said, they don't customize it because when you've got to update 200 journals, 200 OJS instances, you can't have everyone customizing stuff because it's going to break. And it's already going to break anyway. I mean, you know this because I was talking to you about something that just was acting weird in our OJS instance one time and was like, why can't I see the error report? That's weird. Okay, whatever. But imagine how much worse it would be if everyone was trying to get their UI to look different. Yeah.
Jay:
[30:47] It's not MySpace.
Mike:
[30:48] Right. I mean, in some ways, if a person is hosting their own content and they want to take on the responsibility, go for it. You know, in OJS, you can upload a CSS file. And that's often enough for you to do whatever kind of damage you want to do to yourself that's easily resettable by clicking the revert to original button. But the open infrastructure question is really interesting in this sort of shared consortial piece. So I actually, I don't think I included this in our notes. So in Canada, we don't have like a nationwide, we have the Canadian Association of Research Libraries, who are sort of a library association who push forward different initiatives, but they don't have services. But in Ontario, OCUL, which is the Ontario College and University Libraries, they're.
Mike:
[31:30] They have this organization called Scholars Portal. And Scholars Portal operate kind of the same way that your regional consortium work. They host repository software. They do software as a service locally.
Mike:
[31:41] They do OJS. They do a bunch of other things. But in the last couple of years, they started to do this thing where they would launch repository platforms for people who don't have them or can't have them. I don't know if this is the same in the States, but in Canada, what's happening is a lot of schools, especially libraries, are losing any developer roles because of austerity principally. And then it's all university IT. and university it or fucking dog shit and so you're not going to get anything and it's like a weird fiefdom like they're they definitely have like mall security cop syndrome they're like power tripping on the dumbest things and so yeah what's happening is when somebody needs a data repository they're like oh no there's a data policy in canada what do we do and like well you should run a dataverse and so what scholars portal is doing is they're launching like a nationally hosted dataverse install so that schools can have one there and they pay less than they would pay for software as a service from another place. And that project's called Borealis.
Mike:
[32:31] They always have to have a cute name for the thing. So it's just a big collection of Dataverse installs. And most Canadian institutions are on Borealis. And I'm on the metadata and discovery experts group for a new project called Scalaris, which is the same thing, but with DSpace. So all of these schools who have ancient installs of DSpace 5 and no staff to update them, or they've got something bespoke, or they're on content DM or whatever, they're on, they need to have something that works. And so they can approach Scholars Portal and have their content migrated, and it all lives in this place. And so in this case, that distribution is really nice, because it takes the customization out of people's hands. And the biggest struggle we have in talking to librarians about this, and I think.
Mike:
[33:13] I'm from a different perspective. In the publishing space, metadata is relatively rigid. You know what you're going to be giving to people. It's stuff that exists in JATs. It's stuff that exists in Crossref. Your title, your abstract, your affiliations, your references. It's really not too complex. But one of the things that have been happening in repositories for years is people just staple whatever the fuck metadata they think is important that they want to appear on a landing page. And they just put all kinds of stuff in and they put it in and stuff that isn't Dublin Core. It's made up like DC degree period.
Mike:
[33:42] What is this and why is it here and it's because this is my doubling course bad right well and it's somebody wanted it right they wanted it to appear on a page and so they recorded the thing and they think it's really important and so i've had to explain to all these people like if your metadata isn't usable downstream the only purpose it serves is for display on your repository record which is a place no one looks so there's really no reason to be recording all this stuff and then someone will say well how do i have all my custom forms and how do i have all my custom this and i just go, you don't do it. Just don't. You don't. It's a wonderful excuse to not have to worry about any of this stuff. So as people are sort of thinking about migrating over, you know, I'm having a lot of conversations about like making sure your repository is open air compliant. Open air, I think is like a real blind spot in the States in particular. I think we don't talk about open air a lot. It's basically this big EU project to index or harvest all of this material from repository. So, you know, obviously, for years, lots of people have said the problem with repositories is I can't search through all of them simultaneously. Open air is meant to kind of solve that problem. So you push your content to OpenAIR. And then OpenAIR does this thing where they're also pulling metadata from like Orchid and Crossref and Datasite and whatever. And then they'll like de-dupe those records and merge the metadata. And then they can push the combined better metadata back up to you, which is like the dirty idea.
Jay:
[34:58] They kiss the person who made this.
Mike:
[35:00] You can. The downside, they do use like a little smidgen of AI, I think, in their de-duping. But I think that's relatively newish. And I think also probably prevents somebody from wanting to unalive themselves is my suspicion. But I think it solves a lot of these discoverability issues, right? I can hop into open air and I can go, show me Canadian research. Show me Canadian research on the coronavirus. And it's like, yeah, here you go. Here's all the repository versions. And here's all the DOIs for all these pieces. So like as a piece of open infrastructure, it's really huge. So if you've got all of the Canadian schools in this like centrally hosted place, it'd be better if it was distributed, but I'll take the running good new software instead of the running DSpace 5, and they're all pushing their stuff to open air, then it's more discoverable than it would have been generally. And they don't have to wrestle with Google Scholar or wrestle with these other things because Scholar's portal will do that for them.
Justin:
[35:51] Yeah, open air is, I'm pretty sure even when I was running, so my first time getting into Skullcom, I was a generalist librarian. My job title was metadata and emerging technologies, which was what I got instead of a raise was I got that job title.
Mike:
[36:07] Sick.
Justin:
[36:09] Well, my director knew what was up.
Jay:
[36:11] Emerging technologies.
Justin:
[36:12] She's like, I know I can't give you a raise, but I know you're going to be out of here in a year or so. So let me give you a fancy-sounding job title. Good on her.
Mike:
[36:20] Fair enough.
Justin:
[36:20] She was doing the best with what she did. That college no longer exists. Okay, she was doing the best with what she had. So I'm very grateful for her. The supervisors, most of the supervisors I've had in libraries. And I think even with ContentDM, I was able to push stuff to open air. And that was how it got into Google Scholar. And that was how it got everywhere. And I asked, and then when I started my job at my current university, we had DSpace and I said, okay, let's get this. This has not been connected to any of these things. Let's get it connected to open air. Let's get it to open door. Let's get that metadata pushed out. And then we went to BPress and I said, BPress, what do you do? And they go, everything goes through open air.
Mike:
[36:59] That's it we're done right and.
Justin:
[37:01] I was like yeah cool i don't have to do it because.
Mike:
[37:03] I was like because.
Justin:
[37:04] I was like you know give me your rss feed or whatever it is i need to like register independently with open air and they're like no we.
Mike:
[37:10] Got it yeah you need all your oai endpoints and oai pmh that's what it was yeah oai pmh yeah so i i think like that stuff i think in canada the the push for this stuff has been has been really good and we have a receptive group of people who are all using these ancient esoteric like you can see the like like the metadata i would describe as like like archaeological strata you like look through and you're like oh i can see the brief period in which you recorded the year in which a student graduated on their thesis metadata or i can see the year in which you included like this department that no longer exists but had its own special metadata requirement or whatever like it's nice to just sort of say you can throw all this shit out none of it's nobody uses it you don't need it But that's a hard like librarians don't like to throw things out. So it's really hard to kind of break people. And the other part too, I think this is the open source part is now I'm saying to people, stop going to your IT department and asking them to break this in a subtle way for you. Instead, what I'm going to ask you to do is log into GitHub and see if there's an open issue in the DSpace repo about this problem you're trying to solve, and then find someone to contribute to it and fix it so it's solved for everybody. So instead of you guys just like going down the hall and knocking on someone's door and sweetly asking a guy who inexplicably still has a ponytail to fix your problem, this other thing, just go be part of the open source software universe and contribute back and say to the developers, yeah, we do really need this thing. We do really need that thing. I think that stuff's really important.
Mike:
[38:39] Learning what open source software is and how to contribute back and how to be part of that conversation is huge. And a lot of people haven't done it. They've just gone to the one person in tech support that can do it for them.
Jay:
[38:47] And you can do that if you don't even know how to code or anything. That's the thing. People think that in order to use GitHub and participate in these things that you need to know how to code and that you're participating in that way. No, you don't. You can just go be annoying and be like, hey, this is broken. Can someone fix it, please? Thank you, love you. And then it happens It's going to be a bunch of nerds like, oh, fuck. And then they want to fix it. Like, it's great.
Sadie:
[39:09] I was about to say that. That's like the siren call. Like, this is broken. And I need it. Thank you. Love you. Bye. And I'm like, perked up like a fucking prairie dog. Like, how can I do this? That's a good Nike person.
Justin:
[39:25] I'm useful. People love feeling useful. I don't want to get on a whole rant about motivation at work. But give people freedom to be able to do stuff. And give yourself that freedom by learning how to contribute to community projects and say, you know, I mean, if GitHub is not your thing, maybe you're more of a writer, get into fanfiction communities and just learn how communities do things. Seriously, get into piracy. Learn how pirate communities work. Like do things by the way we're rated explicit so you don't have to say things like on live i just didn't want.
Mike:
[39:58] To we've.
Justin:
[40:01] Been rated explicit a long time so you're not gonna say.
Jay:
[40:03] If i get too much.
Mike:
[40:04] You don't need to say by the way get into piracy you need to say get into preservation no.
Justin:
[40:08] I meant piracy.
Mike:
[40:12] Like this is one of the things i love about working for pkp right like we do so there's at a pkp sprint which they hold all over the world whenever we.
Jay:
[40:19] Love a sprint.
Mike:
[40:20] They encourage people, they're like, look, you don't need to code to show up to these. And so most of the time, half of the room are people who just like OJS or work with people who use it. And then they sit down at a table with some developers and they try to figure out a scope for what a project would be to work on something that isn't working for them. And then that turns into a deliverable project or our documentation is heavily community supported. They do documentation sprints like every Friday, people log in for an hour and they just work on a couple docs that need updating and they push it all back up to GitHub and it's taken care of. Like having people sort of contribute and feel empowered to dig in and not intimidated and they're working with other folks in that space, I think it's huge. And it means down the road that those people are picking up more tasks and they're showing up more in GitHub and they're reporting issues because they get more comfortable and they understand it.
Justin:
[41:07] Yeah, it's sort of like Wikipedia too. Like maybe you're not going to write the Wikipedia article, but you can go to the talk page and be like, are you sure this is like the right term for this? And that's you know you've documented it you've documented like hey this is this is a problem i think i don't think this is like the right word for this demonym you know like i've seen that one i've seen the n-word written out and talk pages in and hyperlink the guy felt the need to hyperlink, the n-word in his in in his talk page for inuit because he had he had a point to prove.
Mike:
[41:40] Cool no thank you.
Justin:
[41:42] So maybe don't go on wikipedia but anyway it's the same idea yeah.
Jay:
[41:46] Wiki data is where all the cool nerds are.
Justin:
[41:48] Yeah yeah which i still really have never i've never just sat down with it and done anything and i really really want to hopefully i'll get to in the future i want to on the topic of like consortial work because like that i had two different questions i'm not sure which one's more on topic okay i'll leave it up to you do you want a question about the nature of publishing or research data management i.
Mike:
[42:11] Think well i know why research data management is front of mind for you at the moment. But I think the nature of publishing is more my vibe.
Justin:
[42:17] Okay. So the nature of publishing, something I've been thinking about for a while is, you know, the way that we have like Texas Digital Libraries is all these OJSs are hosted for us. All these D spaces are hosted for us. They get consortial group rates on data site to met DOIs. If there was a parallel organization, because unfortunately you've got all this software, all this great tools, but a journal requires a community to work it and it requires some level of technical support beyond what someone who...
Justin:
[42:50] Is maintaining OJS can do. You need, in particular, people who can manage the workflow so stuff doesn't get lost, tag people, copy editing. Copy editors don't need to be too specialized. So I've always wondered, why aren't we building publishing consortia where there are people on staff who know how to copy edit, who know how to ingest a manuscript, and who will say like, okay, TDL has your OJS running. And since you're a member of this publishing consortia, your faculty members, your editorial board will work with this consortia to get those issues out. Because one problem we've been having in particular is if you want to be indexed anywhere, you have to publish regularly. And even the easiest one, which is the director of open access journals, you have to publish once every two years. And that's too much to ask for some people due to turnover, lack of interest, lack of rewards. And it's not like it can't happen. There's one journal where I had to say, you should really, if you can't get indexed, what's the point of making this a journal? Because otherwise, you can just put these essays into a collection in the repository. We don't need to deal with OJS anymore. The repository is easier to work with.
Justin:
[44:08] If you want to be indexed or if you want to have something scholarly, why not make a scholarly monograph? We have press books, another open source thing. Why don't you just make an edited scholarly volume? So the real issue is, one, do we need more journals, which I think we do and don't. We need journals where they need to be created. We also need commercial journals to flip to these OJS-hosted open source things because ultimately the journals run by people, most of whom are public servants most of the time unless they work at a private institution. So from your work, because I know you work with a lot of different committees and across a lot of different things, how viable do you think that kind of model is?
Mike:
[44:49] Like is that a world you would like to see yeah i i mean other than copy editing which i think is the kind of the white whale here right this is the real the real cost and the real issue and maybe layout editing on top of that xml type setting those those are kind of the big ones right like making your galleys and making sure nothing looks like crap but everything else is is is pretty well taken care of so coalition publica in canada is kind of is kind of doing this thing so the way they've managed to do it is you are a coalition public member many of them are library published ojs journals and they pull all that content in and then they sell that content through the canadian research knowledge network crkn they're also they're kind of our lyricists they manage our like a consortial orchid and other things like that and they also take part in like transitional agreements and negotiating with major publishers across the consortium so they they kind of handle bunch of things. So the way it works is Coalition Publica, an institution pays into Coalition Publica as a subscription, a lump sum of money. And then Coalition Publica is distributing that money for subscription journals. They get whatever their subscription is plus service fees. So they're doing some of the metadata correction. They're doing some of the intervening. They're doing some like DOI provision and some other pieces.
Mike:
[46:03] And then the journal collects income. And then they also do that for open access journal. So if you're a diamond OA journal, that money is just split based on, I think in some ways, readership. Based on a metric I don't fully understand, but you get paid. And so we had journals that were starting up here at our university that were sort of early in this agreement. And they were saying, we never would have thought we would ever like this entire enterprise was funded by money from this project for years, because we wouldn't have had it otherwise. And the other thing we have in Canada that's that apparently is not super common is one of our tri agencies, the Social Science and Humanities Research Council Canada, SHRC, they have this thing called Aid to Scholarly Journals Funding, where if you are running an academic journal in Canada, you can specifically apply for this funding for like operating cost. So that might include hiring a copy, or that might include your hosting fees for OJS or whatever else you need typesetting or other pieces. I don't think any of these journals are breaking even on this grant money, or the coalition public money directly anymore. And I think you know, there is a lot of need for especially the copy editing piece. But I think we're making ground here in providing these supports nationally. And then the other part that's happening is the community of practice.
Mike:
[47:09] The library publishing folks who are also a part of Coalition Publica are talking to the journals. So there's like, you know, journal publishing associations, and we're providing resources and expertise to those folks who are working together to solve problems. So if somebody wants to know, you know, how do I do such and such in OJS, we'll host a session or a community call. We just had one two days ago on flipping to Diamond O.A. And so we invited two journals that have flipped, and then a representative from Coalition Publica and a representative from Carl, I want to say, or maybe a library. And they all talked about this idea, how they helped journals flip from a for-profit, you know, maybe they were hosted by Wiley or whatever. And then they left, and then they went fully independent. And now they're self-sufficient. And that rules. So the copy editing piece, I think is hard. I think it's just hard to like, once you have distributed labor that way, you almost also need like distributed HR, managing the staff is one level of apparatus higher, I think, than we're able to really suss out nationally. I would love it if you could just freelance people, but I think your quality would be highly variable. But I think it's really the only missing piece is the copy editing. But I think the other parts work if you can have a national platform like Coalition Publica that pulls together your library hosted journals and your existing Diamond OA journals and makes a package that is appealing for institutions to pay into. It's kind of like subscribe to open. You pay into it because you believe in the project. I think that's a big deal.
Justin:
[48:36] Yeah. And I think another part of it that right alongside the copy editors, there's still like people who are going to go through and name the files, upload them properly and do some administrative work for the journal, which is in particular getting the ISSN set up and getting indexed. And I still feel like that expertise could easily be shared because all you need is someone to come in and go, do this, do this, do this, do this, do this. I'll show you how to do this and then we'll keep you going. Because the one thing I worry about with my imagination, sort of my realistic imagination, not like my utopian dream. My realistic imagination of how do we get to like open scholarship and non-commercial publishing is piece by piece. We flip these journals, we flip another journal, and then everyone goes, hey, everyone's flipping all these journals. Let's really do a tidal wave. And that will be a turning point that will be very, very impactful. But then I worry one of the things holding people back is getting into those traditional measures of prestige which is web of science and scopus unfortunately if we could abolish those i don't want to abolish web of science i think scholarly indexing is important but them as like a measure of this is legit yes i think is another very very big problem yeah.
Mike:
[49:57] Absolutely i i just threw it in the in the chat but pkp has had this document in varying iterations for like the last 20 years called getting found staying found and it's like about all the things you'd need to do for your journal to be like a real, a real journal to get its wish to be a real boy. And so you've got your, you know, your, your journal standards and identifiers, getting an ISSN, getting in DOAJ, how indexers work, all of this stuff. PKP hosts like well over 500 clients with journals and they don't, they don't babysit the editorial for these. They mostly just hand them the documentation and say, here, you're an adult, go for it. And, you know, They read through the content and they figure out how to index their journals. And if they're serious and they do the work, they make out okay. If they're big, smooth babies and can't solve problems, their journal disappears in a year. And I think that's just kind of the way it works.
Mike:
[50:50] But I think giving them the tools and having lots of like webinars and letting them know they're supported and that it's okay to ask questions. And I think especially like trust librarians and trust Skullcom's folks who are doing this work. That's a real hurdle for people. Like I have a journal editor who's a managing editor who just retired. And I would describe him as the most condescending motherfucker. Like every time I talked to that guy, he was just like looking at me like I was the biggest fucking idiot. And then he would ask me what the difference is between a website and a web page. And I just wanted to curb stomp him. I was losing my mind. That this guy and he was like hired to usher their journal into the digital age and he he i remember i showed him one time that you could do like ctrl c ctrl v and it fucking blew his mind like this stuff like like i think there's an there's an age of editorial that is quite literally dying off of people who are intimidated by the idea that if something happens on a computer it must be difficult and so i think we're seeing more folks adopt and embrace tools to do this stuff on their own. I think they're a little bit more adventurous. I think they're tired of or radicalized by people like me screaming at them in sessions when they came in to hear about what publishing is, that they want to do something a little bit different. So, you know, generational change in academia is a long road, right? But I do think we're starting to see people understand that traditional publishing doesn't really work for them. And they're looking for other alternatives and they're willing to lean in a little.
Justin:
[52:12] Yeah. I mean, I'm going to push back a little bit on like letting the big smooth babies fall by the wayside for two reasons. One, faculty have way too many responsibilities. And it's true that things are going to fall by the wayside. So if they don't get their journal indexed, they're like, well, I ran the journal this year. I didn't get paid any extra money to do it. I'm sorry I didn't index it. It's not going to get me fired. So I do like the idea of having like administrative staff who can help people index. And also So the problem is when people are under supported, they burn out. And these are the people that we don't want burning out because they're not big smooth babies. They're people who believe in diamond open access. And it's like, if these people just go, well, this shit sucks. That can't be good either.
Mike:
[52:55] Yeah, no, you're right. That's true. Yeah. And I do hear you. I think we've had a good mix of editors that are well-meaning people who don't have time and who are toxic nightmares who just want to be difficult. We definitely have had a blend.
Justin:
[53:11] I mean, both exist, but when I'm talking about making a critical mass of getting diamond open access, I think we have to make sure we're not losing people along the way, especially the people who give a shit. Bullet point four. What do you wish librarians and library staff understood about metadata? Like more librarians should know about metadata.
Mike:
[53:31] I think... In my work data myth busting yeah i think in my work it's kind of twofold one i'd alluded to earlier that like more metadata is good i don't necessarily believe like there's a point where you've sort of enriched a document enough that you can just let it go but i i think the thing that i see a lot from people who don't who don't work in the space of open scholarly infrastructure is like a like an understanding of what happened when a publisher publishes a work like what hitting the publish button does. And the like bizarre Rube Goldberg machine of open scholarly infrastructure does to the metadata when you do it. So, you know, you push, I register a DOI when I publish, and then that metadata goes to Crossref. And then a bunch of that metadata goes to Orchid. And then it goes to OpenAir. And it goes to all of these other places. But then the other place it goes, it might not be super obvious, but like Mendeley or Zotero are ingesting metadata from a DOI or someone's doing a repository ingest using a DOI and they have an expectation that like the metadata will be good because a publisher provided it which is not a given lots of publishers could give a shit about metadata they don't benefit from making their metadata good for anything other than their own indexing platforms they don't have to give anything to crossref if they don't want to you know other than a title and an author i was working on this this metadata project for multilingual metadata.
Mike:
[54:53] And one of the things we found was like chunk of records from it was Springer, I'll shame them. And they had like, like 10,000 or 20,000 records in our sample that just didn't have title fields. And title is a required field in in Crossref. But what they did is they just jammed a single white space in there, just put up a bunch of records, and they just left them. And I know that they've been asked to update them, but they just don't. And so this idea someone's like yeah i use zotero it's met everything came in in all caps and i'm like yes because publishers not because a zotero is bad and i think a lot of people don't understand that that that publication point is is where all of that stuff comes from it all flows downstream you have to think of it as like an entry point and literally a river of metadata and all the other places that suck it up and then when something gets fixed it takes forever for that stuff to get distributed by all of those places so it can live so it's like it's really important when you hit publish that you mean it and you've thought about it the publishing shouldn't be you shouldn't hit that button as a second thought or a formality it's a big deal yeah.
Jay:
[55:55] This is uh when i used to work when i used to work in an academic library and would do instruction i was often the zotero guy because i you know i was the person who knew how to use it right and i would always tell students like you know, this is magic and we'll do this all for you, but also double check it because sometimes EBSCO, I love EBSCO, but sometimes EBSCO will like put the affiliation or the email address in the name field and not its own field. And so when Zotero brings that metadata in, you'll have like an email address or like a title or an affiliation in the author's name. And then it fucks up the name stuff and everything. Yeah. Like I was looking through the the documentation that you linked to us which like i love i'm a documentation nerd.
Mike:
[56:44] It's great i.
Jay:
[56:44] Love i love these documents.
Mike:
[56:46] You showed you sent us um.
Jay:
[56:48] But like the thing of like metadata is not style.
Mike:
[56:50] Yeah and.
Jay:
[56:51] I was just like stop putting it here just because you want it to show up like i i just like felt that in myself it's also an accessibility issue.
Mike:
[56:58] Totally it's.
Jay:
[56:59] Like how you shouldn't use an h1 except for the title of the web page because of screen readers and you shouldn't use like h2 or h3 or whatever just to like change the size of something on your on your page because that fucks with like people who use screen readers like it's the same thing with metadata it is serving a purpose.
Mike:
[57:18] Yeah but also the people who make this software need to understand that when i ask somebody to jump through all these hoops all they want is for their journal website to look away so it's like in ojs people are putting metadata in a place like i've seen a doi in a title field because somebody wanted it to appear in the table of contents and so they see that and go like tell them not to and i'm like i could tell them not to until i'm blue in the face what they want is a doy to appear on their title page can you make that possible and they're like oh i guess so like trying to figure out like what like like emergent behaviors metadata rigor result in like why are people jamming metadata where they're jamming it and what does that mean where are we not meeting their needs maybe metadata should be more flexible maybe we should allow people to display three titles side by side if they want but what we record is the metadata that actually matters you know maybe more options in display but i i think it's it's both of these things where people want to express you know i've had to do indexing for like a or metadata work in ojs for a poetry journal and they're doing all kinds of wild shit and that's great i love that they're doing wild stuff but it's also super annoying and so it's like i there should be ways to accommodate the ways they're using the software, ideally.
Mike:
[58:28] Without them having to make their work look a way they don't want it to. But yeah, you're kind of forcing people to care about metadata and metadata isn't something like i always say this like metadata happens to scholars not for them it's like oh i gotta fucking what like they just don't want they just don't want to and i get it there's so much else going on it's like i know why you don't really care about this i just need people to have like like like express some grace towards the universe of where this stuff is getting generated from because it's bad and it's hard and not everybody has time for it and it's not zotero's fault that this meditative story that just it happened at the publisher end and that's what it is and i i meet a lot of librarians who don't who don't know that or they look at an orchid record and go this orchid record is all fucked up and it's like all that the publisher didn't put it in it's a it's a real i remember i was in it so pkp is a sponsoring organization for crossref so if you're an ojs user and you're in a country that's on their gem list, their equitable membership list, then we sponsor your OJS or your Crossref membership. And I remember like being on these service provider calls with all these other people from publishing and people saying, Crossref, you really need to make sure your metadata is better. I remember going like, what do you want them to do? Correct it? Correct 1.6 million records? Like, what are you talking about?
Mike:
[59:43] Just make sure your metadata is good. What do you mean? Like, you don't want Crossref editing your metadata you've deposited. That's a can of worms. nobody wants to open like i yeah it's just it's just a weird a weird situation so this idea of stewardship and who who owns who's who's owning their shit in the metadata space i think is really important and i i think that's the thing that a lot of people just miss out on i and i think it happens because for years people have just had this work be cataloged catalogers are actually very good at this i see people in the repository space all the time trying to like reinvent cataloging and i'm like you listen like you don't need you don't need.
Jay:
[1:00:18] I'm right here you know you don't need to.
Mike:
[1:00:20] Write new rules there are very esoteric rules and they exist and you should just use them uh you don't need to do this again.
Jay:
[1:00:27] I was about to say like mark like already allows you to have like parallel linked titles like i i do arabic cataloging sometimes and you can have like the romanization and the arabic script linked in two separate fields so that in your your opac they will show up on the same line like in the article you you you shared with us about like metadata across different cultures it's like that's a thing that mark has been doing yeah you know like it's not hard mods.
Mike:
[1:00:57] Is secretly a very good.
Jay:
[1:00:59] Xml schema i love mods i love mods fuck dublin core mods is where it's at like dublin core is so basic that it then forces people they think oh it's so flexible because it's just like a title and a name like whatever there's like two fields but what that means is you have to use like a notes field for six different things and there's not a field for this exact thing you wanted to do its flexibility means it makes you do weird stuff that then isn't really compatible across systems or something like mods that is more like here's a field for this exact thing you want it's like thank you mods yeah like yeah it just works better they tell you exactly.
Mike:
[1:01:40] What it's for you know exactly what there's.
Jay:
[1:01:41] No ambiguity this field is for.
Mike:
[1:01:43] This thing in this situation and it's very straightforward here's the attribute here's the element here's what you do i mean dublin core is only flexible because it's essentially bumless right.
Jay:
[1:01:50] Yeah i.
Mike:
[1:01:53] Really don't i really don't like it and it's ubiquity makes me.
Jay:
[1:01:55] Very mad yeah so.
Justin:
[1:01:57] Backing up a little bit why don't you well not backing up a little bit because we've been talking about like lots of different organizations why don't you explain like what crossref is and like why it's important.
Mike:
[1:02:09] Right. And I would lump DataSite largely into the same conversation. So probably one of the biggest misconceptions I hear from journal editors often, and even some repository folks, is they'll just say like, I minted a DOI. And they think just making the DOI is how a DOI is. But for DOIs to work, they require a third party. They require a registration agent. And that registration agency is storing the metadata. And it's where the DOI takes you to resolve to the URL that's stored. So there are two major, at least in the sort of Western publishing space. There are other DOI providers, like Japan has its own DOI registration agency, and there are a few others globally.
Mike:
[1:02:43] But Crossref's kind of publishing-flavored, and DataSite's kind of repository-flavored. And each of those accounts for the metadata they store. Crossref and DataSite work together, and they share metadata back and forth. But the whole idea is you get a DOI for an article, a unique identifier for an individual publication. All of its metadata is stored alongside that record, including the URL. When you click on a DOI, it takes you to the thing. It's not like a bit.ly, it's not like tiny URL, and it's more than a handle, even though all DOIs are also handles. So Crossref is an organization they kind of sprung out of. And I think I made this comment in our Discord that like Crossref's biggest like political hurdle is that a lot of people see them as basically an op for publishing, big publishers in particular. And I do understand where that comes from. But they store metadata specifically related to academic articles or conference proceedings or other specific academic publications. And their metadata more or less aligns with JATS. And they record a lot of it. And Crossref is unique in that they also store reference metadata. That's the Crossref part of Crossref. So that you can see when other people have referenced your work because it's deposited in the metadata of an article, and you can sort of track the relationships of works in general.
Mike:
[1:03:51] CrossRef are enormous. Their API, which is public, is hugely used, and a major piece of OpenAlex and OpenAir and Orchid and all of this stuff sort of relies on the CrossRef API. So if something doesn't use DOIs, it's essentially visible or invisible.
Mike:
[1:04:09] And Datasite still has lots of visibility too, but their metadata isn't as rich for journal articles, say, as Crossref. So that's who Crossref is. And I think why they're important is just that they've filled this gap that was missing in open infrastructure, which is the place the data stored so that it could be pushed to all these other places. Their open API is great, and I'm glad they have it. They have, I think, really generally put their money where their mouth is in terms of being a major part of Open Scholarly Infrastructure. They hand that metadata to whomever. They are not picky. Their membership fees, like I know, you know, for some DOIs are not affordable, but a dollar a DOI is not crazy. And Crossref have recently said that over 50% of their revenue comes from smaller independent publications and not the major publishers. So it's clear that lots of people are using their services.
Mike:
[1:05:00] But i think it's that connective tissue right it's like the thing that's holding all of the metadata that gets pushed into all these places because otherwise you would need to be indexed by orcid and all these things individually but if you push to cross raft then that metadata is just available and and out it goes so that's why they're important and i think at this point like i don't want to say too big to fail because that's an enormous jinx but like they're kind of necessary like if we lost cross raft tomorrow i think it would be a nightmare uh and i really worry about what would replace it Elsevier-owned Crossref would be a true nightmare. It would be truly, truly awful. And I do think like, I remember I was in a PKP sprint with a Crossref staff member, and somebody at one of the OJS sort of community members said, you know, we're really worried about this idea.
Mike:
[1:05:43] Like Crossref is centralized, and it's a risk. If something happens to Crossref, we're all fucked, and it's not good. It really should be distributed. And we're really worried what will happen if you get acquired. You know, We've seen this happen a million times in publishing. It would be a big mess. This developer was like, look, I love working at Crossref, but we know that ideally, we don't really need to exist because this could be distributed and other people could handle it and the infrastructure is open and people could just take it and use it. So I would not like to lose my job because I like working at Crossref. But ideally, the platonic ideal of what Crossref is doing is that we don't actually need to exist as an organization and this just happened. But whether or not that happens you know they just announced they're moving from their own self-hosted software to aws which is a move in the opposite direction so i don't i don't totally know but they're they're vital and as far as i know they're not ghouls i always i was on ghoul watch and i don't i don't think they're ghouls yet it's.
Jay:
[1:06:38] Always like pleasant like it's always nice but i'm like pleasantly surprised when like a vendor's not a ghoul.
Mike:
[1:06:43] Yeah totally you know yeah I mean, Crossref, in the work I've done with PKP, I remember they invited me to a Crossref live event. They don't do those in person anymore, but it was in like 2019, the last one they did. And they sat across, sat me across the table from an IEEE guy. And I just gave that guy shit the whole weekend. And they were like, they were like loving it. I was like, great. Like, I expected them to kind of be like, I can't believe we invited this loudmouth. And they seemed to be appreciative that I was giving this guy a hard time. And that's how I knew that they were at least a little bit in our corner. And they've always been huge supporters of OJS. When they contact us, like, you know, you guys have a bunch of members in all these places all over the world that are on a old version of OJS. Can you help us help them find a way to update their software more easily? And that's exactly the kind of question I like to get from an organization like Crossroad.
Jay:
[1:07:29] Yeah.
Sadie:
[1:07:29] I commend you for your giving triple I some shit.
Mike:
[1:07:33] Oh man, anytime.
Justin:
[1:07:35] Could you tell us a little bit about the work that you've been doing with, is it Niso or Niso? I never know how to pronounce it.
Mike:
[1:07:40] I think it's Niso.
Justin:
[1:07:41] Okay.
Mike:
[1:07:42] Because I've been hearing a.
Justin:
[1:07:43] Lot of great stuff and i i want to share it with the world.
Mike:
[1:07:45] Oh that's nice about my work or about nice yeah.
Justin:
[1:07:49] About about what you've been doing at niceo.
Mike:
[1:07:52] Yeah so in 2008 niceo released this spec for it was a recommendation for journal article version language i think peter suber was one of the folks who was on that group uh and a handful of others and the idea was basically to find you know generalized terms for how you can describe an article that goes through publication and those Phrases are phrases we use every day, accepted manuscript, submitted manuscript, proof, all of these other things. They're very aligned with a publisher's workflow. And then in 2021, 22, I feel like I've been part of this working group for like an eternity. It was supposed to last like a year and it's been like three. I keep ending our calls with like, if any of you want this to last any longer, you'll have to fly to my house and kill me.
Mike:
[1:08:32] Just like losing it. But the idea was to sort of solve a lot of these problems where like preprint is a ubiquitous phrase, but it isn't one of the journal article version term or how we account for some of the problems. This has emerged a lot since we got our public feedback with post-publication peer review and sort of addressing those issues. So the idea is trying to take these terms and make them a little bit more useful to people and then also make recommendations to publishers about transparency, which they're frankly bad at. Most people are bad at it. We're like, let's say what it used to be is like you had a version of record, a term that is like beloved by publishers, but like more amorphous than they're willing to admit. And so you get a version of record of an article and then you change it. And they used to be a corrective. And what we've done instead is said, okay, well, why don't we use semantic version? This is version of record 1.1, 1.2, 1.3, patch language. Let's have this make a little bit more stuff, which is, I think, hard for a lot of people in this space to wrap their heads around because they're not developers and they're worried about what these things mean. And they're also worried about the implications of semantic versioning and like editorial. Like as a metadata edit, point zero one, or is it not? You know, and these are, I think, you know, reasonable questions to ask. But the other thing we want is like, change logs.
Mike:
[1:09:48] Like, if you update an article, tell us what you updated, and when you updated it, and why you updated it, have an update policy, have a semantic versioning policy, tell us what you're doing with your work, make those old versions available, so that people can see those versions. This is one of the major conversations we had about like, if a person cites a word, and then you visit it, and it's no longer the same version as the version they cited, do you want to be able to see the version they cited? So a lot of the conversations are about making this language make more sense to consumers of articles and a little less focused on specific workflows of, you know, a certain specific publisher. But to NISO's credit, one of the things they've done is they've invited a very wide range of stakeholders. Like I really expected, and I think a lot of people expect that what NISO is going to do in this case is just invite a handful of major publishers and their varying glad handing cronies. And instead what they've done is they've invited a bunch of people from universities and library publishing and repositories. We've got people from Archive. We had people from eLife. We've got people from.
Mike:
[1:10:48] Silverchair and someone from Wiley and Taylor and Francis and SSRN. And we've got this like really wide range of people and everybody's bringing something a little bit different to the table. And I think it's nice that they're putting us all in the same room so that we can fight with each other about what scholarship is supposed to be. And I think that's probably why it's taken three years for us to write this thing. In some ways, it's one of the most exhausting pieces of work I've ever done. It's like you sit down and someone's like, we need a subgroup about the definition of an article. I'm like, could you please kill me now? I can't think of a way that would be more brutal for me to spend my time than the definition of an article. But I wrote a big chunk on the definition of preprints and basically said like, look, preprints are so amorphous. Who knows? Here's author's original, here's submitted manuscript. One of these is mostly for repository people, one of these is mostly for public. So I think the work is ultimately rewarding, albeit exhausting. But I think the idea of semantic versioning and journal article versions, I could be really good. But because it's just a recommendation, I don't know who who's going to adopt it. I do know on the PKP side, as soon as I revealed this, I had developers saying, we would love to do this. Let's talk about incorporating this in the next version. So they're already excited and they're already moving in this direction. I think publishers, based on the public feedback we'll release later, a little less thrilled about some of this language. But I think that's to be expected. They have a real vested interest in making sure that the version of record means the things that they always wanted it to mean the.
Justin:
[1:12:14] Indelible version the the truth.
Mike:
[1:12:16] For the capital t the truth yeah absolutely the.
Justin:
[1:12:21] Yeah the version that does not change even though there's a field for corrections and there were more retractions last year than there's like 10 like what was it five to 10 times more papers retracted last year than the year before.
Mike:
[1:12:34] Um there's.
Justin:
[1:12:36] There's a serious scholar communications crisis but we don't have time to get into that when when you're talking about versioning though like if there's version of record one a version of record two are they given different dois or is that in the doi metadata.
Mike:
[1:12:48] Boy what a conversation this was the idea i think the idea so we couldn't make a specific record no i know and uh it's a very reasonable question there are people who wanted a doi for every individual citable version of a work that could ever exist without i think really understanding the potential cost right like you that would be you're registering a new doi that could be quite expensive because it's what Zenodo.
Justin:
[1:13:09] Does when you update stuff I think.
Mike:
[1:13:11] Yeah Zenodo does that I have no idea like CERN I guess just has a bottomless data site account infinite money yeah yeah so it doesn't matter and and CrossRef can do like relational metadata for free like if you update you can update your metadata for free you can associate two works for free there's some of the stuff that doesn't doesn't really have a have a cost associated but I think the idea is that the article takes you or the doi should take you to the landing page and then you can handle the rest with like you know.
Mike:
[1:13:40] A URI or something along those lines. If you want a DOI, you can. But part of the issue is that like, we can't speak for how DOI registration agencies will take our recommendations.
Mike:
[1:13:50] So we have a member of Crossref there. And it's like, I can't tell Crossref how to do their service because they already all have sort of small adjustments on the DOI spec and what they do or do not offer. So we just kind of had to say like, look, talk to your agency and make sure you have relational metadata included to prior versions. You connect these things so that they can be connected generally to data sets and stuff like that. And that's the best case scenario. So one DOI where you could see the other versions of the work, that would be great. I don't think we really know what will happen in a like VOR1 and a VOR2 kind of question. I think publishers aren't interested in answering that question. And weirdly, the people who probably are the most interested in answering that question are the post-publication peer review folks. So we've had to make a new subgroup. We had a little bit of an explosion in a meeting. Kind of a new subgroup made to try to find a way to accommodate post-publication peer review in a way that wasn't like, this is how eLife, or this is how F1000 does it. But let's make a generic post-publication peer review workflow and make sure these terms work. And if they don't, because the big gripe is that these people use VOR for an article, whether or not it's been peer reviewed, because it might be peer reviewed later.
Mike:
[1:14:56] If we need an extra term, then maybe we do it. But that's a double-edged sword too, right? Because I think you're saying, well, then these are less than a VOR, which is going to upset the post-publication peer review people, or you're saying it's equivalent to a VOR, which will upset publishers. I almost always fall on the side of upsetting publishers. I think it's fun and rewarding. And so I'm happy to push that through. And I think we have enough people where we can kind of have this discussion lean that way. But we'll see.
Mike:
[1:15:22] Ultimately, it's like, I have to not wild out too much in these meetings, because I'm worried they'll kick me off the committee. But I'm a co-chair. So I really have to wild out super hard. So far it's worked out and i i think we're we're well on our way and i'll say this i like some of the people i thought would be really really obnoxious like publisher wise have actually been sweeties one of the co-chairs is with acs and he's just been an absolute sweetheart he's like the nicest he's like hey mike what's your shirt say and he's like the friendliest dude i uh i love him he's he's never fighting with me about any of this stuff he's very every now and then he'd be like you know my bosses kind of don't like this thing and i'm like that's okay he's just very like I love it it's it's so good so yeah I think it's been rewarding and I can't wait for like when I can finally die I just want it to be over.
Justin:
[1:16:09] What I want to know is, do you have any more stories from these publisher conversations around the table of this committee? I just want to do like a little tea spilling ceremony.
Mike:
[1:16:21] Yeah, I mean, I think a lot of the people who are representing, like our Wiley rep is someone who like has been acquired by Wiley and used OJS for you. Our Taylor and Francis rep is like a very well-meaning wonk who just wants the language to be nice. I don't think we've had anybody who's pushed back like honestly we haven't really we got in our first fight in this last year and it was about post-publication peer review and e-life and a handful of us really really pushed back on this guy i think the weirdly enough the librarians have actually been kind of the most annoying um people who insist that all repositories do it this way and then i have to say that's absolutely not the case like it is how you're doing it it's not how everyone does it please don't well this is how libraries do it they're like no it is for sure how you do it so i think that stuff has been good i think it's felt like pulling teeth like that i've really had to drag people through giving a shit about this which is a little frustrating working with nice has actually been great the the person who originally recruited me to be on this thing is from my province she's lived in the states for forever but she was just like hey you're from new Brunswick and you work with PKP.
Mike:
[1:17:35] That's a perfect, you're like fitting this thing perfectly. Let's chat. And they've always been like, I expected in a very conspiratorial sense for these folks to be kind of steering me in the direction of the status quo. And generally, they've been very accepting of shaking things up and having this be more representative of what publishing looks like now, which has been a real treat. I really didn't expect it. I thought for sure I'd be doing a lot more fighting in this committee than I have been. It might just be personalities, but it's kind of hard to say. I do wish that some of the voices that had dropped off the committee had stayed on it. Like we had an eLife rep and eventually they just had to step down in a committee like this the last three years. Some people just, they get a new job. Like I think probably five or six people on the committee ended up in a new job and they stepped down as a result. So we lost a lot of potential voices. I would have loved to have eLife around to have this conversation about, about these. But I, you know, really, it's been surprisingly trauma free, except from when this guy from SSRN wilded out about eLife. And then I had to I had to see if we have a code of conduct. It's not something I was expecting, expecting to have to do. We don't really have one, by the way. It's basically just be nice to people.
Justin:
[1:18:44] There's not a nice standard code of conduct.
Jay:
[1:18:47] Nice.
Mike:
[1:18:48] Yeah, nice.
Justin:
[1:18:51] So it's a nice number for 20.
Mike:
[1:18:53] They acknowledge that it is very old. They said, this definitely needs updating, they said to me. And I will not volunteer for the working group to update the NISO standard.
Justin:
[1:19:02] No, no, no. If you're on another committee that's gone three years, you don't have to sign up for another one to fix something.
Mike:
[1:19:08] No.
Justin:
[1:19:08] It's like jury duty.
Mike:
[1:19:09] Yeah.
Justin:
[1:19:10] So this is the question I tried to make sure to prep you for ahead of time, which is like, in your ideal world, what would this publishing system look like? And the example I gave is like, after the revolution, is there still cross-ref?
Mike:
[1:19:24] Yeah, and not as we know it, I think is the answer to this question. Like, I do think that distributed systems are the way to go. Obviously, we've seen a lot of people get acquired. We've seen, you know, varying problems where, you know, an Azure server goes down, and then, you know, a million websites are offline. I think distributed hosting in this space is the ideal, I think the platonic ideal. And I think DOIs meet that same criteria. when we were talking about this in the discord the other day i mentioned that core had had this conversation about investigating a spec for something to kind of replace the doi as an identifier that was distributed and i don't really know if that's going to have legs but it would be nice if it did um i think what i would like this publishing system to look like is is obviously just a lot less capitalism that'd be super cool we could just have less just like less blatant extraction I think the overwhelming theme of the last bunch of years is I feel like I'm fighting with Skitch and all of these other folks who exist in a universe that no longer exists, and they believe publishing to be infallible. They don't understand that people are doing peer review with Chad GPT and that that's maybe a problem. They don't understand that peer review doesn't mean what it used to mean. We had a lot of feedback in NISO about peer review, and why isn't peer review included in the VOR description? I had to say, well, because lots of things that are versions of record haven't been peer reviewed. It's just the case.
Mike:
[1:20:52] And also, peer reviewed isn't this pinnacle of quality the way you believe it to be. Peer review is flawed. I think what I'd like to see is more people sort of investigate the status quo and think about what they're doing. More established professors kind of understand that this is a problem and not just do it passively. When I talk to researchers a lot, I always tell them to like.
Mike:
[1:21:12] Publish with intention. Think about where you want to publish and why you're publishing there. Don't come to me after and go, I thought I needed to do this, but they want an APC. And instead go, I knew exactly what I was getting into when I published here, what kind of stuff they publish. And I thought about it. Where you publish matters. I think it's a big deal. Less emphasis on metrics would be beautiful. More emphasis on people not getting screwed or publishing under duress. We're just starting to look into our assessment criteria at our institution. I would describe them as light for what we get paid and we're faculty and it's really nice. But we don't have any publishing requirements. And I don't think publishing requirements for librarians are good. I think the state of library publishing is an indicator.
Jay:
[1:21:57] It's real bad.
Mike:
[1:21:58] Yeah, I think a lot because a lot of the people who are doing library publishing never had to do like a PhD in comprehensive research and all this other... It's like a lot of the time it feels like a book report to me. And so it's bad to make people publish but then like i see my colleagues going like oh you know we've got so much rigor and i'm like i'm literally on a discord where one of the channels is i'm looking for co-authors because, people have to publish for their jobs in so many other places and i think we need to have some kind of balance here between intentional publishing and and other pieces these are more problems with academia in general but i think a distributed system that meant that these these software platforms were managed by you know in open source software by people who are pushing content out to open scholarly infrastructure that researchers didn't just let all this stuff go out of this like preemptive capitulation would be super cool and literally any engagement with the idea that how you publish matters in any way would be a real treat beyond just the phrase high impact journal i would like if i could just like flush that phrase down the toilet forever i would do it So those are kind of the broads. I think CrossRef does good work, but I think they know even that that super centralized one big API on an AWS server is not a forever thing. And so I hope that there's a better world ahead. Of course, recent events indicate it is harder and harder to expect that that may happen.
Justin:
[1:23:25] So Crossref is like a big ledger, right? So what if we had a distributed ledger?
Mike:
[1:23:34] No!
Justin:
[1:23:34] That way, everyone has the transparency necessary to see where the ledgers all lived.
Mike:
[1:23:40] You know one time i was in a conference and it was a guy from the new york public libraries talking about using blockchain for all of their circulation information and i was like what the you can't do this like i can't think of a worse idea than having a public ledger a public ledger for all of your circulation statistics like what are you what jesus i.
Jay:
[1:24:03] Mean x libris already collects all that shit and it's the most unsecured shit.
Mike:
[1:24:08] In the whole world anyway like jesus christ what's.
Jay:
[1:24:12] One more thing you know.
Mike:
[1:24:14] Yeah it's not often i hop into the chat and zoom to just write what the fuck are you talking about but no stop it stop that i know you like dpla.
Jay:
[1:24:23] Was gonna do some blockchain shit at one point i was like girl stop.
Mike:
[1:24:27] Why brutal just.
Justin:
[1:24:28] Make a backup there you go now it's distributed two places.
Mike:
[1:24:32] I do want to mention before we're done real quick just this just because Jay had noted multilingual metadata and some of the issues in multilingual metadata and the challenge of like flattening language and culture. I think it's actually a really interesting problem. And I think it's an interesting problem in two ways. One is that like multilingual metadata, as we know, it is more labor for the people who are multilingual. So those journals are going through significantly more effort to make sure that they have, you know, three titles. And then they're, they have transliterations of name metadata, and they've got all of this other stuff. And they're doing it predominantly for indexing, But all the major indexes have like English language predominance as a sole selector and validation process, which really sucks. So the problem with those folks are doing way more work and it's not getting passed downstream because...
Mike:
[1:25:18] English is frustratingly like the lingua franca of publishing in a lot of ways. It's not good. So I think the best way for standards and metadata is actually for all of the organizations that are doing this work to understand that it's okay if something has more than one title, or it's okay if an author has more than one expression of their name, or more than one expression of an affiliation. In the research we're doing, I didn't share a preprint for this, but I think I alluded to it.
Mike:
[1:25:43] Crossref, for example, takes multilingual metadata when you give it to them, and they store it in XML, when you access it via the API, you only access whatever the first language was. So if you have three abstracts, and they're in like, English, Spanish and Portuguese, and someone pulls that metadata from the CrossRef API, you only get the first one English, but they have the other two. But the question is, like, where would somebody pulling that metadata put it? And where do you expect it to go? And I think like CrossRef have expressed that they they want to find solutions around this because they know people are doing the extra work to have it collected, but where it gets distributed outwards is a major problem and i think honestly one of the places that i think we should be really pushing people is the citation people like like mla should know what to do with this apa should know what to do with like like all of the reasons this stuff is as rigid and stupid as it is is because of citation styles and so i think there's a bit of a blind spot here where people are like whoa you know crossword test and know how to store this and i'm like no, citations need to know how to handle multilingual content too and they're very very bad at it they expect a canonical title and i think i think that's a major part of the problem that's requiring people to basically like like flush their cultural identity down the toilet in order to fit.
Jay:
[1:26:55] A first.
Mike:
[1:26:55] Name last name metadata.
Jay:
[1:26:56] Yeah so i i mentioned in the notes like like i said i do arabic language cataloging and for folks listening who don't who don't know if you are in like mark 21 like you know traditional bib cataloging and you're going by like library of congress guidelines for their like romanization they treat all arabic language no matter like if if you don't know like there's not like arabic language there's a bunch of dialects of of arabic right and lc goes nope it's all classical arabic it's all the quran that's the only thing that has ever been written in arabic ever and so and also the way that things are pronounced even in modern standard arabic no don't care you have to romanize it literally and not how it's pronounced and i'm like cool but then like i was cataloging a palestinian zine for read palestine week right and a lot of the names like were not already controlled names right and they were english i hit my microphone like english romanizations of their names but they were like by the.
Jay:
[1:28:05] People whose poetry was in the scene. But when I had to then make, not formal controlled headings, but when you make a 700, like a 7xx field or whatever, you do it as if you were making a name authority file. So I had to reverse engineer what is these people's name in the Arabic script, and then what would the LC romanization of that be, which it completely removes then the Palestinian pronunciation of their own names and I was like this is bad like that especially like in a time where like Palestinian culture is being actively destroyed like the dialect and the language itself like by erasing that and flattening it that's like that's like part of the genocide like I'm like this is bad like I shouldn't be doing this.
Mike:
[1:28:54] Yeah so.
Jay:
[1:28:55] I was just like how does that even like how do we make that not happen.
Mike:
[1:28:58] Yeah I was thinking about this because I listened to your last episode about like indigenous metadata and in canada we've had this conversation a lot about like if you're from a specific community and you don't want to put your colonizer's country name as your country in metadata there's no iso standard like i'm i'm in the land of the mig mobilistic that's not i can't that doesn't exist and there's so many of these subdivisions and those iso standards are like rigid like they're not like if you if you put something in there that doesn't belong in there, it can break things. It will just say, this is invalid. This has to be a country. And so you're like, we're doing all this work to make sure that we're doing like decolonization of publishing and we're trying to meet all these standards. And then someone who's from like the Squamish region in BC can't pick that as their nation and they pick Canada. And they're like, oh, cool. I guess I had to pick my colonizer as my country of origin. Like that stuff sucks. And it's like, there are, and then people say, can't I just change this? And the really frustrating part is like, well, you can, but here's all the stuff it breaks. And so these are the two pieces that are really frustrating.
Justin:
[1:30:04] Yeah, when Jay told me about how, because I think in one of the zines, they had their Arabic name in Arabic and had transliterated themselves into Roman letters. So you know how they want their name to appear Romanized and you have to ignore it. I feel like that's, aside from the genocide, which is not a thing you should normally say. But on a human level names are very important to humans and like if it's like I spell my name this way like this is what one of the beauties of Orchid is is like, Your identifier is a number, and however different ways you want to write your name, that's fine because it all comes back to the Orchid.
Mike:
[1:30:49] Yeah.
Justin:
[1:30:49] And so it's so crazy to me that we have this Orchid system that does it right, and that Jay's telling me about this system where it has to be in classical Arabic, and he can't Romanize the names, the way the person themselves, whose fucking name it is.
Jay:
[1:31:03] And make sure it was in a contents field, but still, you know.
Mike:
[1:31:06] Yeah, you have this, like, it's like, oh, Orchid is a little bit like being barcoded. And you're like, well, it's creepy. And then you're like, but actually, barcodes are pretty convenient. It's actually kind of good.
Jay:
[1:31:16] Yeah. Like, I have my dead name for all the world to see in my Orchid because my thesis and my two scholarly articles that I've ever written that actually that like get cited and shit are in my dead name. And it would be a pain in the ass to ask the journals to change it because that doesn't get pushed to someone's Zotero or something that's like already been published. There's like, my name's not going to magically change in all the places it's cited. So it's like, fuck it. Just put it in the orchid. People can know. I don't fucking care. Like, and that just makes my life easier because then I don't have to go be like, man, can you please change my name on this thing I published in 2017? It's like, I don't fucking care. Just put it here.
Mike:
[1:31:55] It's fine. I mean, they probably would. And this is like an ongoing conversation about what you can change. But this brings me, actually, you want my ideal world. Here's two hills I'll die on. One, all name fields should be a single string and there should only be one citation language these are i i will go to the fucking grave for these things all names single string one citation language that's it that's all that's all i care about that.
Jay:
[1:32:16] Sounds like something like some like christian.
Mike:
[1:32:18] Fundamentalist would use.
Jay:
[1:32:20] As like fear mongering about the antichrist like they just they only want one.
Mike:
[1:32:25] Citation language and.
Jay:
[1:32:28] They'll be barcoded.
Mike:
[1:32:29] They want to eliminate vancouver Vancouver's beautiful I.
Jay:
[1:32:35] Mean one of the skitch people is never mind.
Justin:
[1:32:38] Jay one time didn't you just put your orchid in the name field because you didn't want to put a version of your name is that what happened no.
Jay:
[1:32:46] No, what I tell people to do, and Brie Watson, shouts out, friend of the pod, has written about this and mentioned me as a way of doing it this way. I tell people in my Orchid, if you're going to cite something that is published by anything other than J. Colbert, so if you are citing something published by my dead name, use my initials. Like use jl that way in like apa citation language it's going to be the same no matter what my first name was when i published it and also put my name in the citation so like there ideally there should be like a field in like a bib citation for an orchid or like some kind of like identifier so it can be like hey this is the this name but then here's where you can go back and look at this person's works like as an easier way of tracking them down so it's like use my initials and put my orchid in there but i don't just have the orchid as my name that was pretty cyberpunk.
Justin:
[1:33:51] In my imagination.
Jay:
[1:33:52] No no i just have colbert jl because all you know library science uses apa and that uses initials instead of first names so that's why i picked my boring ass stupid name that i don't like so that my initial didn't change fun fact i don't like my name but i did it so my initial didn't change i like it i know you do but you don't it's not your name okay.
Justin:
[1:34:20] Well i think then we've covered everything is there anything mike that you want to look um.
Mike:
[1:34:27] I guess i just just shameless blue sky dog shit i'm just ahemnason.bluesky.social a-h-e-m-n-a-s-o-n you can also find me in a number of places as a cab for cutie a username i am very proud of i feel i feel very good about and that's really it i'm all over the place doing talks for people i don't i guess people like the way that i talk really fast and say a lot of things emphatically and if you see me in an association call or whatever you know come on in and i always love to answer questions like a lot of people who burn out and i'm on the tail end of one of those right now i'm always happy to answer questions so drop me a line if you want to know anything about metadata and i will do my best to answer that question within the limits of my sanity generally okay.
Justin:
[1:35:12] Great thanks so much for coming on.
Mike:
[1:35:14] Thanks for having me good night.