From: R. Russell Allbery <[r--a] at [CS.Stanford.EDU]>
Newsgroups: rec.arts.comics.info
Subject: Comics database
Date: 6 Feb 1994 18:27:42 GMT

Okay, here you go.  Sorry about the delay.  First is a description of my
database, and then I attached all the conversation on rac.misc about
building a database, as well as some information I got from comp.databases.

--------

A * indicates that a field is a key field.  N are numeric fields, A are
alphanumeric fields of size give by the number after the A, D are date
fields, and $ are money fields.


Comics Table
------------
 Code                      &  N*             This table is linked to Stories
 Title                     &  A40            by the Code field and the
 Volume                    &  N              Book Code field in Stories.
 Number                    &  A30
 Version                   &  A10
 Printing                  &  N
 Market                    &  A6
 Publisher                 &  A25
 Date                      &  D
 Format                    &  A20
 Color                     &  A5
 Number of Pages           &  N
 Cover Price               &  $
 Condition                 &  A4
 Book Value                &  $
 Continuity                &  A40
 Appearances               &  A60
 Notes                     &  A60


Stories Table
-------------
 Book Code                 &  N*             The Book Code and Story Code
 Story Code                &  N*             fields link this to the Jobs
 Story Type                &  A15            table.
 Page Number               &  N
 Story Grade               &  A2
 Art Grade                 &  A2


Jobs Table
----------
 Book Code                 &  N*             The Creator Code links this
 Story Code                &  N*             to the Creators table.
 Creator Code              &  A8*
 Job                       &  A10*


Creators Table
--------------
 Name Code                 &  A8*
 Full Name                 &  A40


---------

From: [s--g] at [rigel.statoil.no] (Stig Tollefsen)
Newsgroups: rec.arts.comics.misc
Subject: Comic Databases
Date: 24 Aug 93 14:29:22 GMT

This is the revised and updated version of my contribution. This
document includes tables for characters and continuity, and takes into
account some of the suggestions made by other good people. My apologies
for not posting it earlier, but I've been frightfully busy! Be sure to
post suggestions and criticism.

Just a few notes on the comics database tables:


Primary key:

Title <string>
Volume <integer>
Issue No <string>
Version <string>
[Distribution <abstract type (Boolean) Direct/Newsstand> suggested by
Scott McMahan - put it in if you care about the distinction.]
Print No <integer>

Title is the exact title as written in the indicia. Volume is the series
number - Hawkworld v1 is the Truman mini prestige while Hawkworld v2 is
the 32 issue ongoing. There is a problem with such a construct though -
how to number successive runs of titles published by different
companies, like the numerous The Shadow titles. Also, some people
collect comics from more than one country - how do you deal with say the
American Aliens series vs. the UK one? Thoughts on this would be
appreciated. Issue No. needs to be a string because of peculiar issue
numbers like double numbering (Cerebus 112/113) and the notorious
Limited Collectors' Edition from DC with numbering starting with C-.
Version is for issues with variant covers etc. The string can be an
explanation like 'gold cover' or an established convention as with X-Men
v2 #1A-E. Print No has to go in the key for value reasons; note that a
reprint does not have to be identical to the first print; it does
however need to have the same title/number in the indicia. Thus neither
Marvel's Milestone editions nor DC's Silver Age Classics series qualify
as reprints; they are separate titles. With the latter, convenience
would force us to treat 'DC Silver Age Classics' as the title with say
'House of Secrets 92' as the Issue No. There is a drawback with using a
relational design; it does not easily enable us to follow a series
through name changes (Moon Girl/Moon Girl and the Prince/A Moon, A Girl,
A Romance/Weird Fantasy/Weird Science-Fantasy/Incredible Science-Fiction
is a good example ;-)). Neither does it let us model the branching or
merging of titles (Weird Science and Weird Fantasy becoming Weird
Science-Fantasy is an example of a merger, All-Star Comics becoming
All-Star Western and much later becoming All-Star Comics again, resuming
the old All-Star Comics numbering, is an example of a split). I have
done an object-oriented model of this, if anyone's interested I'll post it.

In addition to this, the main table could contain:

No. of pages <integer>
Cover price <in a pre-defined or self-constructed money format>
Format <abstract type, like prestige/standard/glossy/whatever>
Size <in centimeters or inches>
Date <abstract type - see later>
Year of Publication <integer>

I know values are a big deal for you speculators out there - a separate
table can be connected with the key of the previous as a foreign key, giving:

[key]
Number of issues <integer>
Grade <abstract type>
Current value <abstract money type>
Purchase price <abstract money type>

This would require one entry for a purchase of 10,000 Youngblood 1s in
NM, with Current value steadily decreasing :-P, but would require
several if these were spread over different grades. Alternatively, have
a value breakdown table showing the percentwise value of each grade (M
120%, NM 100% VF 70% etc.). These need to be tagged, because this
breakdown is different for different classes of comics (a NM $10 silver
age comic in FN is worth less than a current NM $10 comic in FN, for
some reason). This table could be keyed with a string saying 'current',
'silver age' etc. The above table would then be broken into two:

[key]
Number of issues <integer>
Grade <abstract type>
Purchase price <abstract money type>
Grade breakdown <abstract type>

[key]
Current NM value <abstract money type>

If you're collecting comics from more than one country, you obviously
need a currency conversion table as well, which is kind of complicated,
because you would need to keep each occurence in the database if you
want to calculate profits! For you Britons out there, this means that
your purchase of Spawn 1 at GBP 1.25 would show a profit as US values
rise, but you don't want that profit to waste away if the pound ever
gets stronger against the dollar. It's the exchange rate at purchase
that counts. And yes, I know you have your own guide but there are other
nationalities out there.

Now to publishers - this could be put in the main table, but if you want
to be fair to publisher collaborations like 'Marvel and DC Present
Featuring the Uncanny X-Men and the New Teen Titans' (wheee!!), you need
a separate table:

[key]
publisher <string>

This combined would constitute this table's key. It is a bit cumbersome,
and one of the reasons I'm not too fond of a relational design...

If you want to get really advanced, there is an additional problem:
Vertigo is part of DC Comics which is part of Time-Warner. All three
publish comics (at least I _think_ Time-Warner has published a few
mass-market paperback editions, but you get the point). Do we want to
distinguish Vertigo from DC, but all the same keep the information that
any Vertigo book is also a DC publication? If we do, then we need some
way to model it. For my o-o design I did a recursive container-class
construct, for a relational design it has to look something like this:

publisher <string>
sub-publisher <string> (with NULL as default)

Any query involving publisher would then have to be cross-checked
against this table to pick up sub-publishers as well.

Date:

Dates have to be defined in a separate table or as an abstract type, if
your package supports it. We need support for:

Weekly comics
Bi-Weekly comics
Monthly comics
Bi-Monthly comics
Quarterly comics
Annuals

At least. In some cases, it is desireable to sort comics by time of
publication. This necessitates the following information to be kept:

Key:
Frequency <string> (Monthly, Weekly...)

Max_in_year <integer> (12 for monthly, 53 for weekly...)
Sequence <integer> (5 for May, 18 for Late September...)
Name <string> ('January', 'Early August'...)

Name would not be used for weekly comics, so give a NULL for those.
Additionally, a Semi-Monthly type could be added, meaning 24 issues a
year, as opposed to Bi-Weekly which really means 26 issues a year.
Quarterly comics are a bit vague; each season should correspond to three
calendar months, but there is no consensus on which ones, and
subsequently this often varies. Note that this table is mostly constant,
and would better be implemented as an abstract type.
Year of publication should go in the main table.

Format:

It's a matter of taste whether one wants size to be part of Format or
not. I'll say 'Standard Format Magazine' is valid and therefore put Size
on its own, like this:

Format

Key:
Name <string>

Paper Quality <string>
Binding Method <string>
Cover Stock <string>

Size

Key:
Name <string>

Width <integer or real or some other type>
Height <ditto>

It is not appropriate to put colour in either the main table or the
Format table; format is independent of colour, you can have b&w prestige
format books as well as colour ones. Also, part of a book can be b&w and
part colour, so it's best to put this in with the story information.

Contents:

A comic does not always contain only one story with only one each of
writer, artist, editor, what have you. Sometimes a comic contains art
pages (pin-ups), text pages, editorial material and (shudder) ads. Most
comics also have a cover. In addition, individual stories are often
reprinted. You don't want to duplicate story information, so that rules
out a direct link (in database terms, what we have here is a
many-to-many relationship). Thus:

Contains:

[main table key]
[contents key]

These two together are the key for this table.


Contents:

[main table key] gives comic where first published
Type <string or abstact type> (Story, Pin-Up, TextPage, Cover...)
Sequence <integer>

Name <string>
Number of pages <integer>
Comment <free form text> for special information. Ex: 'Superman dies in
this issue'.

The first three are the key. Type and Sequence are supposed to convey
information like 3rd story, 2nd cover etc. This is needed because
stories and other material cannot be keyed by title - titles are often
reused, and some stories are untitled (I don't think there is a single
titled story in LSH v4, well 'The Quiet Darkness' maybe). Give default
of 'Untitled' to Name.

Of course, stories are often subdivided. A storyline can span several
issues, and a story can have several chapters within one issue. These
can be entered as usual in the Contents table, but left out of the
Contains table. In this way, the Watchmen storyline can be shown to
include all 12 (comic-book story) chapters/sub-stories plus the text
pages. A separate table must be used to link these:

Storyline:

[contents key for storyline]
[contents key for chapter]

Queries must be written to use the contains table when traversing
comic->contents, but search storyline table when traversing
contents->comic. The possibilities are many and varied.

Creators:

[contents key]
Job <string, or abstract type>
Start page <integer> default 1
End page <integer> default number of pages
(If you are really hot on this idea, you can have start/end panels also here)
Name <string, or abstract name type>

Job would be penciller, inker, writer, plotter, painter, colourist, whatever.
If you want to do searches on creators by surname, an abstract type with
Surname, First Name, Intitial or something of the sort is required.

There are several points to make here. I have used a lot of mostly
undefined abstract types; these are trivial with a good package but may
cause problems. Also, what to do with aliases? Graham Ingels and Ghastly
are the same person, when you do a search on one, do you automatically
want the other? Then you need a separate table which cross-references
all aliases. What about Louise Jones/Louise Simonson? A good comics
database should handle this. Finally, keys become quite large here. If
your package can generate unique aliases, use them. Another desireable
feature is the ability to create sort algorithms for the abstract types.
Ideally this model needs an object-oriented DBMS; I have implemented
some of it using ONTOS and C++, it really does the job much better. I
guess my point here is that with all things taken into consideration,
the average user just can't build a satisfactory solution himself. I'll
say a commercial product would be beneficial. Comments?


Characters:

Modelling characters is similar to modelling creators. For example, Dick
Grayson, Jason Todd and Tim Drake are different characters, but they are
all (or have been) Robin. Similarly, Iron Man, Captain Marvel, Thor,
Doctor Fate, Batman even - have all had different people behind the
mask. When we ask for Batman, we want all Batman appearances - no matter
who's behind the disguise, but asking for Bruce Wayne should exclude all
of AzBat's. To do this, an equivalence table can be constructed:

Alias:

character <string>
alias <string>
start [contents key]
end [contents key]

This will have to be tied up with a continuity table (see later).
Without the start and end attributes, there would be no way to
distinguish Tim Drake's appearances before he became Robin from those after.

We need to link a character to a story:

[contents key]
appearance type <abstract type, cameo, guest-star/featuring/supporting etc.>
character <string, or abstract name type>


Continuity:

Few things make fans more angry than bad continuity. With the wonders of
database technology, you too can become a continuity wizard! A few
points must be made:

Even though two connected stories may have continuity conflicts, this
doesn't necessarily mean that continuity's screwed, only that it has
branched into different paths. For our purposes though, we have to
assume that when one story makes a reference to another, they belong to
the same continuity. The destruction of New Genesis in the Hunger Dogs
was later retconned away. This is valid; continuity before Hunger Dogs
branches into two different continuities: the Hunger Dogs one and the
current one. Cosmic Odyssey belongs to the Hunger Dogs continuity since
it mentions the destruction of New Genesis. The inconsistency happens
when current continuity makes references to Cosmic Odyssey! You can't
have your cake and eat it too.

Continuity cannot be dictated from the editorial regime; once the
stories are written, they define continuity by themselves. By writing
down all outside references in each story, a continuity tree can be
constructed. Note that if stories A and B both refer to story C, and
story A also refers to story C, the last reference is redundant, and
should be thrown out. Note also that by nature references are made to
stories previous in (virtual story) time, while we need to traverse
continuity in the opposite direction, from start to finish. This must be
done in the database package using some sort of rule. If story A refers
to story B, then (part of) story B takes place before (part of) story A.
The key of story A is then inserted into story B's continuity table.
There are of course some difficulties with this. A story X using
flashbacks could have the framing sequence referring to story W, while
the flashback sequence occurs way back before story A (assuming
linearity A-X). You then get a closed loop, and this has to be resolved
by the means of database rules. Time travel stories also make life
interesting (remember Mr.Z meeting Superman *again* in Superman 53(?),
while Supes never saw him before? Later on, in the 'Time and Time Again'
storyline, Supes meets Mr.Z again in Germany during WWII, and this is
Mr.Z's first meeting with Supes? Closed loop, folks). The table itself
is simple:


Continuity:

[story key]
<key of forward referenced story>


Remember that you don't have to put in tables you'll never use, and if
you do, they won't take up disk space unless you fill them in. Other
fancy stuff can easily be added, like picture files for covers (assuming
you have access to a scanner), but this will inevitably eat up your hard
drive. I can't help thinking that CD-ROMS full of already keyed-in data
would be a good thing for both us and the publishers (done correctly,
they could avoid all continuity glitches). Comments are appreciated, and
criticism too; I'm not perfect.

Stig Tollefsen

E-mail: [s--g] at [vega.data.st.statoil.no]

"It's a long hard road, and a full hard drive"   - "Networking", Warren Zevon

===============================================================================

From: [s m d] at [floyd.brooks.af.mil] (Sten Drescher)
Newsgroups: rec.arts.comics.misc
Subject: Re: Comics Databases
Date: 6 Aug 93 05:16:04 GMT

Scott> Stig Tollefsen ([s--g] at [rigel.statoil.no]) wrote:

Scott> And a newsstand/direct field.
	This, as well as printing #, can all be covered by a single
version field.

Scott> The emphasis has been on speculating so far, but we need
Scott> to include some creator data.
	Which, of late, has been real motivation for speculation.  I
don't want value data in the database for speculation, I need it so I
know how much to insure my collection for.

Scott> Personnel table
Scott> [key]
Scott> Writer:
Scott> Penciler:
Scott> Inker:
Scott> Colorist:
Scott> Letterer:
Scott> Editor:
	Already suggested.  However, a fixed list like this won't work -
what do you do with plotters, scripters, 'breakdowns', co-writers,
multiple stories, 'anniversary' issues where the creative team changes
every 4 pages, etc.

Scott> Instead of that huge key, it might be better to implement a
Scott> book table:

Scott> Title:
Scott> Publisher:
Scott> Unique Code:
Scott> Series type: ongoing, mini-, one-shot, special, graphic novel etc.

Scott> And use Unique Code, Vol, Issue as the key. Normalize it a little.
Scott> And you could add notes to the book table about the book in general.

Scott> Ex:

Scott> Title: Avengers
Scott> Publisher: Marvel
Scott> Uniq Code: AVE
Scott> Series: ongoing

	Well, the unique key should be automatically generated.  A good
hash function should be able to handle it, but I would prefer the table
to be more like:

Title: The Legion of Super-Heroes
Volume: 3
Series Type: ongoing
Publisher: DC
Sort Key (never seen, generated): legionofsuperheroes_3_ongoing
Hash Key (never seen, generated): 13642

The primary key for the file would be the hash key (I'm assuming a 64K
limit on titles ;-), with secondary keys based on the sort key, and the
publisher and sort key. The sort key would be generated by stripping
leading any a/an/the, whitespace, punctuation, and converting Roman and
Arabic numerals to the appropriate sort value (will have to prompt about
this - more in a sec) in the title, and concatinating that with the
volume number and the series type, separated by underscores.  The number
conversion would need to be manual, because of the widely differing
situations:
Generation X: no conversion
Robin II: II -> 2
Robin 3000: 3000 -> threethousand
etc.

Scott> Maybe even an equivalent and/or became table...

Scott> BECAME table

Scott> West Coast Avengers : Avengers West Coast
Scott> Classic X-Men       : X-Men Classic
Scott> Coke                : Coke Classic
	The became table is good, but it needs to handle splits and
merges (Dr Strange + Cloak & Dagger => Strange Tales => Dr Strange,
Sorcerer Supreme + Mutant Misadventurtes of Cloak & Dagger).  It will,
if you don't insist on unique keys.


Scott> So that only one record for a title would be in the book table --
Scott> and use the current name for the book...
	Bad move!  How would you have dealt with the situation above?
Just because the numbering isn't carried on, doesn't mean that the
storyline isn't continued, and that's the primary purpose of the become
table.  Also, I believe that several of the early Marvel superhero
titles started sharing a pre-hero title, then split into multiple
titles, each picking up the numbering of the former title.  I know the
Hulk did, and I think Iron Man, Captain America, and Thor did the same thing.
--
Sten Drescher		[s m d] at [floyd.brooks.af.mil]
#include <disclaimer.h>

===============================================================================

From:   IN%"[m c tajdi] at [vaxd.dct.ac.uk]" 17-AUG-1993 08:44:11.53
Subj:	Paradox: Need help implementing a database design

>Here's the problem:  Each comic book can have one or more "stories," where a
>story can be anything from the entire book to a single page gag.  Each of
>these stories are done by creators (usually writer, penciler, inker, letterer,
>and colorist, but this can change), and the creators can change from story to
>story.  I want to be able to keep track of all of the creators who worked on a
>given comic, and also keep track of which story they worked on.  There are so
>many different possible divisions of labor, as well as the possibility of four
>or five pencilers in a single story, that I can't use Writer, Penciler, etc.
>fields.  Instead, I would like to have a name field and a job field so I can
>enter the name of the job.  Essentially, I think there would be a table of
>stories for each comic entry, and a table for each story listing all of the
>creators.  I have no idea how to do this.
>
>Worst case scenarios: Around six stories in a single comic, each with a
>different set of creators.  Up to forty creators on a single story with an
>anthology story (20 or so writers and 20 or so artists).  These extremes are
>what makes this difficult.


Something like this perhaps........


TABLE         FIELDS      TYPE          comment

Comic         Comic#      A5*      -    unique identifier (allows letters).
              Title       A30   \  _   /and any other info you keep 
              Date        D     /      \  for each comic.

Story         Story#      A5*      -    unique identifier.
              Title       A30      -    and any other info....
              Comic#      A5       -    link to 'Comic' table.

Person        Person#     A5*      -    unique identifier.
              Name        A30      -    and any other info.....
              Occupation  A20      -    eg writer
              
Job           Job#        A5*      -    unique identifier.
              Person#     A5       -    link to 'Person' table.
              Story#      A5       -    link to 'Story' table.
              

  This is just off the top of my head, no guarantees.....
  The links enable you to join all these tables when doing queries.

Let me know if you have any problems.

Alex

===============================================================================

From:   IN%"[T--M] at [SRC.SBS.utah.edu]"  "Tim Ma" 17-AUG-1993 18:13:40.46
Subj:	RE:  Paradox:  Need help implementing a database design

Interesting idea.

Just off the top of my head, but a hierarchical design seems to be
a possible solution.  Work from the large entity to the smaller
entities.
So, from  COMIC BOOK -->  STORY  -->  CREATORS  (3 tables)

Table1:  COMICS - this table should describe the comic book as a whole
Fields:  [ID]        - an arbitrary number for ID purposes (unique)
         [Publisher] - Example:  DC Comics
         [Title]     - Example:  Superman
         [Number]    - Example:  241
         [Date]      - Date released or something

I'm not quite sure how comic books are identified, so you could
add whatever fields you think are necessary to help you identify
the book as a whole.  Perhaps [Series] = Superman   and
[Title] = "Death of Superman" or whatever.  Do comics have actual
titles like that?  You will want to index the [ID] field in
this table because:  (1)  ensure it is unique  (2)  speedier lookups


Table2:  STORIES - identifies stories within each book
Fields:  [ID]  -  should correspond with ID of the whole book
         [Story #]  -  (see notes below)
         [Story Title] -  (see notes below)
         [Type] -  (see notes below)

The [Story #] is just another arbitrary number that identifies
the story within each comic.  The logical choice would be just
starting from 1 and working your way up.  The number should be
unique *within* the book, but may be duplicated across comic
books.  Example:

[ID]  [Story #]
----  ---------
1001      1       first story in comic 1001
1001      2       second story in comic 1001
1001      3       etc ...
1002      1       first story in comic 1002
1002      2       etc ...

[Story Title] and [Type] are just identifiers.  Again, you know
more about the field that I so you could probably think of better
fields to identify stories with.  Maybe page numbers (?).  I put
[Type] because you referred to "single page gags".  Perhaps there
are other descriptors.  You will want to index both [ID] and
[Story #] in this table.


Table3:  CREATORS
Fields:  [ID]  -  corresponds with both of the above tables
         [Story #]  - corresponds with STORY table
         [Name]     - name of creator
         [Title]    - writer, inker, penciller, whatever ...

You may want to index the first 3 fields here.  This will ensure
that you have a unique  comic-story-creator  value.  Unless of
course, the same person does the writing, inking, pencilling, etc...
You have some leeway with this table.  It just depends on how
you want to break it down and identify the creators.  So you can
see, CREATORS duplicates the [ID] and [Story #] fields in STORY,
but it doesn't duplicate the story title, story type, etc ...

With these 3 tables, you should be able to find a comic book
using any of the above fields (fields from all 3 tables).

Suppose you wanted to see a list of the comic books (title
and individual stories) "Bob Jones" had a hand in, but only
if they were published by DC Comics.

In Paradox, you do the ASK command to query the tables.  Pull
all three tables up using ASK and do the following:


COMIC --- ID ---- Publisher ----- Title ----- Number ---
       |      |               |           |            |
       | a    |   DC          |  X        |  X         |


STORY --- ID ---- Story # --- Story Title --
       |      |            |               |
       | a    |            |  X            |


CREATORS --- ID ---- Story # ------ Name ------- Title --------
          |      |             |             |                |
          | a    |             |  Bob Jones  |                |


** I'll use an 'X' instead of a check mark since the check mark
   is an extended ASCII char and probably won't make it thru the
   mail gateway.  [F6] generates a checkmark in Pdox.
** The 'a' is a query-by-example character.  The character itself
   is arbitrary as long as it's the same for each table, just hit
   an [F5] and type out a character.  It should show it highlighted.
   Pdox manual has more.

So for your output you would get a table in the following format:

ANSWER --- Title ---- Number  --- Story Title ------------
        |          |           |                         |
        | Batman   |  102      |  Tales of ...           |
        | Batman   |  105      |  Batman eats Robin ...  |


These would be all the comic books published by DC where "Bob Jones"
had a hand in creating some of the stories.

kinda rough, but like I said, it's just off the top of my head.
You'd have to work out a way of getting data in there by creating
a multi-table form, but that's not too bad.

hope this helps,

tim ma
associate programmer
university of utah
email:  [t--m] at [src.sbs.utah.edu]