-
Notifications
You must be signed in to change notification settings - Fork 10
Vague meaning author term #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mrjj, this is a similar issue to #202, with a similar answer. The choice of the WG was to reuse, as much as possible, the schema.org definitions. This is what we did. In essence, the Working Group did not want to engage into defining yet another vocabulary for such terms, and we prefer let the community do that where such words are already done. Schema.org being one of the most widely used vocabularies, also at the core of search engines, this was a pragmatic choice. You are right that the various vocabularies you refer provide a more detailed definitions of authorship, and I can also see that some applications may want to use those instead. This is possible: the manifest is not a closed entity and it is perfectly possible to use those terms instead. See https://www.w3.org/TR/pub-manifest/#extensibility-manifest-properties for more details. |
I think there's an unintended issue here that's the opposite of what is being asked. By the schema.org definition, creator is synonymous with author:
That part of the definition didn't make it into the specification but seems like an important piece to make people aware of. |
Yes exactly for this reason i offer to use existing and widely adopted domain vocabulary for clarification this terms and every time prefer to use whatever standard but use it without violations and coordinating found problems with WG first of all by providing practical use-cases. I understand that your primary goal is alignment between user-agents variety and Schema.org model keeping status quo and this seems sane. You say that compatibility with knowledge graphs behind SE is important. Yes, no doubts about. So, lets suppose i'm author. And i'm interested to raise SE rank of my publication. and this is a common use case. To which field i should place author's name for this? I may suppose you have no idea, and I have no idea, SE maintainers have no idea as well as initial data model creators. As I've seen it large SE development teams mostly being young STEM-oriented talented guys with huge burning stack of burning business features and very few ideas about how domains outside IT are working. And they just don't have time to discuss any details of entity model they are getting from more and more data pipes. Support of some new standard for data exchange is very common type of burning feature. After reading the standard (from the point of implementation cost/time reduction) everything that was not clear getting resolved ad-hoc. Then all other agents of data exchange getting poisoned by this ad-hoc and after some time they will get formal internal specification form both, model and logic. Because for SE its important to explain how ranking logic works to other players. All this forming interexchange core between majors. After reaching internal interests parity majors will open exchange data with mid/small level players using same model but documented for third parties. Early adopters are often STEM guys of the same kind but making domain-specific business or integrating major's solution. And they have even more burning business features and seeing sane strategy not in gathering own data banks but using ready from majors. Best industry specialists will be involved to govern and supervise exchange formats and align them with interests of market players and wide public and prevent growth of exchange formats variety. Then this negative feedback loop will mechanically repeat or with new kind of data market offer will start new one. From my side I offer IMO very delicate ways not to feed this process. Supposing that You may provide to standard users some comments related to the logic behind standard and some rules of thumb to save adopters time.
I supposed that using only one of the two synonymous (actually not) fields for core fields is a way toward this this goal.
I think just explanations may help not affecting any technical compatibility. Maybe its an option to introduce explicit fields display priority for the core fields. It will help to make tradeoffs related to limited screen space predictable way. As well as sort logic for the publications lists. I expecting more underwater stones with this quasi-synonyms. All ancient stones that bibliographers already met before they omitted |
Perhaps not, but the practicalities of schema.org can't be changed here. It's a vocabulary that does arguably suffer a bit from a lack of central control over the design of the types, but that messinesss is also the reality of expressing web metadata. In neither specification was it our intent to press for specific or restrictive metadata practices, as we wanted to adapt to what people already do on the web. Taking out author isn't solving the problem, for example, as you cannot stop someone from using a term that is a part of schema.org (or this group has shown no interest in excluding valid terms). If you really want the term removed, you need to take your arguments to the schema.org folks. And, unfortunately, my experience is that all the explanations in the world aren't going to make people author metadata "correctly", either. Creator is often confused with contributor, for example. It's not uncommon to find the names of people who worked on the digital format listed as creators in EPUB. We can quickly get bogged down in all kinds of minutiae if we try to design bibliographic records, and the reality is that many people will still do whatever they intuitively think is right. It's also a common view in these parts that if proper bibliographic records are needed, they are created and stored separately by publishers. The metadata within a publication is more specifically geared toward the user agent with an emphasis on simplicity and only defining what can be justified as usable by a user agent. We're not out to displace ONIX, MARC, etc.
I've had this concern in the past, but you also have to understand that the publication manifest specification is not web publications. It is a common base from which specific implementations, like Audiobooks, are intended to be designed (that specification recommends the use of author, for example, and is silent on creators). I'm not sure if we can prioritize display at this level, as @iherman has already noted, implementations might use different metadata or could prioritize display differently. |
@mattgarrish thank you for the so deep level of understanding.
I have to sit on both chairs too, data vendors may have fine grained ontology frameworks and concepts (not true) but they are OK for internal exchange process but not so useful when you are trying to help people to gain access to publication, on casual language level they can suppose completely other meaning. Despitely its different from publishing activity case, when we have companies who is developing software clients and they don't want to go into domain, they want clear description of format, forms, display view and so on. As well as publishers, and considering that object of publishing is federal-level legal deposit and clients are targeted on 80%-90% of local active mobile device base. There are a lot of side. You don't want to play definition game, i don't, but Readium did and did it well. But now their activity getting merged with W3C and web-pub specs gaining more and more distance from what Readium have maintained with concern-level standards. Its not a big deal to rename fields technically, but its hard to align all parties involved in publication distribution. If Web Publication manifest standard is supposing everything to be on domain-level so what the standard final purpose being compared to the standard of the thin wrap around JSON-LD/RDF. Well, during only two quarters it happened about 3 flagman standards of web and mobile platforms publishing. Currently we trying to support all on authoring level. But it will be much harder to do on the level of mature software with long-running vendor guarantees behind, e.g. Adobe InDesign is currently is merely supporting small subset of EPUB 3.0, and this is quite expected case. And its not too easy to align all this with integrity solutions based on JSON schema 2019-09 that integrates entity schema with its semantic linkage constraints. And its not as hard to be just early adopter than to have a history publishing and library domain standards behind. And they have a ready answers and (whenever how) working software ecosystem. But they are proven to be unfriendly to end-user, and this level of requests and understanding. Its very hard to came up with something that rejecting core industry experience. And its not surprise that a lot of domain-level experts will be involved by any major adopter. And when you are coming up with everything being dirty against domain in core definitions and a long story about that its hot because you have no idea about domain it will be very bad introduction I really hope that describing what is the major problems of being dpub/webpub standards adopter makes some sense and its not a local-level things. Or you have to extend not authoring/client side of LD integration platforms following Europeana, and its a big bunch of resources in completely different direction. For now we will just try to keep track on changes and align everything that changes and will try to reach proposal level through OCLC as one of domain coordinators and major bib metadata provider through WorldCat but its hard for me to see W3C only as tech-level side and coordination site for all this. Its out of current topic, i've provided some highlights about our experience related audiobooks w3c/wpub#465 in our quite massive use case audiobooks metadata happens to be an a11y extension level thing, because they are mostly not stand-alone creative implementation but product of automatic representation through very advanced voicetech stack from local major (schema.org core maintainer as well). And on practice it happens to be not too far from visual render directives and not affecting metadata related to the description of core creative work structure and content Anyway, thank you a lot for your time and understanding! |
A Pub Manifest can contain any schema.org property not defined in the spec (section 1.4). Selecting both In a standard, having two ways to do the same thing is always a receipe for disaster. I'm therefore suggesting to remove |
Schema.org seems to give primacy to author over creator, as it only notes creator is a synonym for author, so works for me. But do we need to run this by the WG before making a change? It's not an obvious error of the specification. |
Note that, while the schema.org makes it equivalent for the SOE case, I could see the point of using them separately for the purposes of publications. After all, a creator and an author _are_ different notions, aren't they?
Yes, this is not something we can decide; it is not, strictly speaking, a bug in the spec.
|
No it is not a bug, it's only a path to ambiguity during its deployment. Which is something to avoid. |
Right. So we may have to put a note into the document making it clear(er) when one should be used over the other (noting that schema.org does not make a difference between the two). IMHO, this is actually a schema.org bug. Just making these two terms "identical" is some sort of a semantic bug. |
I don't follow you here Ivan. Why not simply delete it from this set of recommended properties in our spec? People can still use |
Sure, "creator" is just an ambiguous designation for someone who played some significant role in the creation of the content. It's the default when you can't say anything more meaningful. It's also a big reason why epub authoring metadata is complicated to process for reading systems, as we've had to find ways to inflect more meaning onto the dc:creator element. But what's worrisome in this case is that the equivalence has already been set up by schema.org. It might not be wise to try and alter that. Otherwise, your metadata will come out meaning one thing for a search engine and possibly something else for a reading system and that's not good situation. I've kind of changed my position on removing it, since removing it doesn't really address the problem. The property is valid whether listed or not, so it just leaves the confusion unaddressed if we drop it. Like schema.org, we should probably clearly note that they are synonyms and even go so far as to state that during processing creators will be translated/appended to author (i.e., author is preferred and when present gets highest priority). That way there's no ambiguity after processing, at least. |
Closing this issue as we added a note earlier about the terms being synonyms with preference being given to author. That's all I can see we can do unless or until schema.org changes their approach. |
Web publication manifest have quite ambiguous
author
field.Author
term is usually used to mention someone's who've originated the written creative work.Creator
is the wider term. And for case when concept set and sub-set are defined on the same level of bib description its vague which field to use (or fill both records the same).Lewis Carroll is both: author and creator of "Alice's Adventures in Wonderland" work and only example in WPM body is highlighting that
author
field should be used. With no clues about criteria of use or explicit difference between 'author' or 'creator'.The short solution i see in using better definitions of
author
andcreator
fields e.g. from LC relator codes vocabulary (text and links below).The best solution i see is to remove
author
field. Because even with clear criteria of difference there will be a second level of problem: how to interpret this fields during translation to other standards and forms (which field should be primary if you have display space to show only one and so on).Below i providing related fragments from neighbour standards and some explanation about the things behind, and why it will be better to remove
author
term and not to removecreator
.Definitions from W3C Web Publication Manifest
Definitions from DCMI
DCMI
standard don't introduce theAuthor
term.FOAF
dct:creator
/dct:agent
and maker terms relationship note worth mentioning:Definitions from MARC21 relator terms vocabulary and LC LD relator terms vocabulary
aut
- A person, family, or organization responsible for creating a work that is primarily textual in content, regardless of media type (e.g., printed text, spoken word, electronic text, tactile text) or genre (e.g., poems, novels, screenplays, blogs). Use also for persons, etc., creating a new work by paraphrasing, rewriting, or adapting works by another creator such that the modification has substantially changed the nature and content of the original or changed the medium of expressioncre
: A person or organization responsible for the intellectual or artistic content of a resource.By fact, i don't see that
aut
/cre
codes are really common in bib catalogues, usually they omited and only name/auth code defined. And kind of relationship is defined by bibliographers for more specific relation than being originator. So even in the official LC example of MARC21 100 main name entry field you may see no authors or creators being defined explicitly.I'm pretty sure that its not an occasional detail because bibliographer always holds part of authority institution responsibility and guarantees about single, as complete and correct as possible description being defined in scope of specific functional requirements and description standard. Any case of collision between two actual descriptions usually means that one of the versions should be considered outdated.
This simple authority control rule is main pillar of de-duplication of metadata being possible, and also preventing lot of holywars about which record is right, and question will be which of them is outdated and which should be corrected. The sane answer is the deletion of record with more recent (less background and sync history) control code and maybe its correction toward other record.
All this usually preventing cataloguers from dealing with ambiguous forms of definitions that have a chance to be catalogued differently, even if there are some in the description standard vocabulary.
BibFrame2
During drift of MARC family of standards toward LD its evolved to BibFrame2 (conceptually FOAF-like agent-activity-entity) model.
Bibframe have a complex work concept levels spine with tree levels. And previous version had two levels with different names. Its hard to say that BF2 is clear or stable but its "good enough to use". Due not being mature currently there is a lot of approaches to practical use and entity linking. But the core idea about contributors is something like following::
The
Agent
is playingRole
(matching relator terms vocabulary) and through this making theContribution
.Web Publication manifest
author
/creator
fields will have form ofAgent
s making contribution as well as Web Publication manifestcontributor
sAgent
and itsRole
in relation to the resource. Used withWork
,Instance
orItem
I do not see BF2 as some perfect extension for WPM because WPM seems to be initially flat by design. In opposite BF2 is designed with large temporal dimension and able to describe historical process that came up with object of culture possible to interact directly. BF2 records web is definitely hostile to any table-form representation and actually (e.g. by exclusion of shelf numbers and other storage identifiers space) do not tend to describe final layer of items or digital objects. And even maintainers offical converors from other standards just ignoring information like physical items storage marks or digital publication containers processing details.
In short: BF2 don't designed to provide both isolated and understandable records and merely targeted publishers
AACR/ISFB/FRBR (It's deep domain, so i'm not providing detailed links)
This standards families are the real ground for all standards mentioned above and DCMI may be considered as the robust and consolidated shortcut down to the bibliographic domain regulation and experience.
The text was updated successfully, but these errors were encountered: