Frequently Asked Questions
What is a LSRN?
A LSRN is a globaly unique identifier of a record in a life science database.
Why do I need LSRNs?
LSRNs can be used as globally unique identifiers. There is a one-to-one correspondence between records and their LSRNs, so LSRNs can be used as unambiguous names for records.
What does a LSRN look like?
LSRNs use the familiar notation database_abbreviation:record_identifier and follow the practice originating from the GenBank/EMBL/DDBJ db_xref qualifier. Here are a few examples of LSRNs:
PMID:16446403
GO:0006915
EC:3.4.11.4
I’ve seen these before. What’s the point of giving them a name?
The concept of pairing database abbreviations with identifiers is as old as the databases themselves. Such pairs work just fine in the context of a single database, but aren’t of much use in database integration (PMID:16446403 in one database may be called PubMed:16446403 in another). The goal of the LSRN project is to facilitate greater interoperability between databases and bioinformatic applications by promoting consistent use of record identifiers. The LSRN project also aims at simplifying cross-referencing by keeping LSRN-to-URL mapping in a single place.
All records have unique URLs. Why not just use them?
Ordinary URLs were not designed to be used as unique names. They have no canonical form (parts of them are case-insensitive, query parameters can be shuffled and added/removed without changing the meaning of the URL etc.) Besides, some databases have multiple mirrors (e.g. GenBank, EMBL, DDBJ), assigning different URLs to the same record. As a result, the fact that two URLs are different when compared as strings does not mean that they refer to different records. This is guaranteed by LSRNs.
LSID solves the same problem, and it’s an official OASIS recommendation. What’s wrong with it?
There’s nothing wrong with LSID; it’s a solid specification, well received by the industry. The difference between LSRN and LSID is in the details, but the details are important. LSID authorities are created by data providers for access to their own data; since, by design, the LSID resolution process returns actual data and metadata, third parties cannot host “proxy” authorities for copyright reasons. This means that there will be no “genuine” LSID for, say, PubMed records until NCBI decides to build its own LSID authority. Also, LSIDs have no official canonical string form (they have optional and case-insensitive parts), which hampers their use as XML/RDF URIs.
LSRNs do not return any data or metadata, so they can be assigned to any existing database record, including those which have restricted access (e.g. CAS:), multiple authorities (INSD:), or no single authority (InChI:). LSRNs are assigned in a centralized manner by the life science community. LSRN-based URIs have canonical form, so they satisfy XML/RDF requirements for URI-based identity. And, last but not least, LSRN-based URIs are URLs which can be used as ordinary links in all browsers (e.g. PMID:16446403).
When I click on LSRN link, I see the the actual record. Is this legal?
The short answer is Yes (if you are in doubt, consult your IP lawyer). Pages generated by the LSRN resolver quote original publicly available content. Lsrn.org does not misrepresent the content’s origin. It shows the original URL and brings about the unhindered view of the content at the user’s request. This practice is generally considered “fair use”; Google image search is a well-known example.
How can I use LSRNs in XML or RDF?
LSRNs are not Universal Resource Identifiers (URIs) per se, but there is a simple way to build a unique permanent URL for each LSRN. LSRNs are mapped to URI space by appending them to ‘http://lsrn.org/’ prefix:
http://lsrn.org/PMID:16446403
http://lsrn.org/GO:0006915
http://lsrn.org/EC:3.4.11.4
What does a LSRN identify?
A LSRN identifies a database record, not its subject. For example, PMID:16446403 is an identifier for a PubMed record, not its subject (an article in Molecular Cancer Research), or its subject’s subject (ubiquitination of p53). These differences may seem obvious or unimportant for humans, but are crucial for computers which have no context information to choose the “correct” interpretation.
How can I use LSRNs to identify abstract concepts like genes?
A LSRN for the record with an unambiguous primary subject can be used for indirect identification of the subject. In the RDF data model, this is achieved by connecting the blank node representing the concept to LSRN URI node representing the record by means of a relation having owl:InverseFunctionalProperty (IFP relation). Then, two distinct blank nodes connected to the same LSRN URI node via the same IFP relation are considered “the same” in owl:sameAs sense. Here is an RDF/XML example which uses skos:subjectIndicator as an IFP property:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<skos:Concept>
<skos:prefLabel xml:lang="en">TP53 gene [Homo Sapiens]</skos:prefLabel>
<skos:subjectIndicator rdf:resource="http://lsrn.org/GeneID:7157"/>
<skos:subjectIndicator rdf:resource="http://lsrn.org/HGNC_Symbol:TP53"/>
</skos:Concept>
</rdf:RDF>
This approach to identity is called RDF smushing.
Note that one concept can use multiple IFP properties for identification; the more
such properties it has, the better its chances to get “smushed” with its
counterparts in other data sources.
How are LSRNs related to OASIS Published Subjects?
LSRN-based URIs conform to OASIS Published Subject specification; they are intended to be used as PSIs and are compatible with existing PSI applications (Topic Maps, RDF). The XTM syntax for TP53 gene can look as follows:
<topic id="TP53">
<subjectIdentity>
<subjectIndicatorRef
xlink:href="http://lsrn.org/GeneID:7157"/>
</subjectIdentity>
<!-- names and occurrences -->
</topic>
How are LSRNs managed?
Lsrn.org maintains the open LSRN registry accompanied by the corresponding LSRN Schema vocabulary. The registry contains information on databases and record types, referenced by LSRN Schema (e.g. PMID:). The registry is updated through direct input from LSRN Editors; once a week (current schedule) the contributions that were not enacted immediately are considered by LSRN maintainers and either included or rejected based on editorial consensus. Most additions to the registry that do not conflict with existing entries are enacted immediately.
Who maintains the registry?
Lsrn.org is currently maintained by Ariadne Genomics, Inc. This domain is created as a permanent home for the LSRN specification, a portal for editors and contributors, and as a LSRN-to-URN resolution service. The intenion of the creators is to provide persistent identities for records in life science databases that can be used as long as the need in them exists. The current owner of the lsrn.org domain undertakes to service persistent http://lsrn.org/* URIs for as long as possible. In the event that circumstances prevent the maintainer from fulfilling these obligations, the maintainer will pass ownership of the domain name and any associated content to a suitable party under terms that shall place no additional restrictions on the usage or accessibility of the representations provided.
How do I become a LSRN Editor?
You become LSRN Editor by editing. Currently, there is no registration of any kind; anyone can edit the registry directly.
Why are some editing options not available for my scheme?
Since the LSRN registry is used by the resolver in real time, some editing options are disabled to make sure that previously made changes are not compromised in the middle of the registry release cycle. The current policy is that one can add to the registry with no limitations; editing the existing content is allowed as long as it does not affect the resolver.
What do I do if the editing option I need is not available?
You can always add an Editorial Comment, proposing the change to the maintainers (or voting down a previous proposal). At the end of the editorial cycle, all proposed changes are considered, and if there is a consensus, the proposed changes are enacted by the maintainer.
What is redirect?
Some LSRN schemas exist only for compatibility with current use in various database cross-reference pages. These schemas are “redirected” to their recommended variants by the resolver. Compatibility schemas should not be used in LSRN-based URIs.
What are the main sources used to create the registry?
The bulk of the current LSRN registry content is derived from three main sources: International Nucleotide Sequence Database Collaboration’s db_xref specification, NCBI’s Dbtag_db practice, and Gene Ontology cross-reference abbreviations (GO.xrf_abbs); other sources are databases themselves.