Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, Swift, BASIC, and whatever else I happen to feel like hacking at.

Importing an index into Nisus

Jerry Stratton, December 9, 2020

Oatmeal raisin cookies: Mary Starks’s ocrisp oatmeal cookies from America’s Bicentennial.; cookies; oatmeal; raisins

One of the recipes that are why I liked this particular cookbook series enough to manually index it.

I have a couple of cookbooks by charitable organizations that I really like, but like many cookbooks of that type they don’t have an index, or at least not a useful one for finding recipes. Of the four books I’ve used this script on so far, one has no index, two have an index in order by page number rather than recipe name, and one has an index in order by author rather than recipe name.

I typed these into a text file using a format that reduces the amount of typing necessary to a minimum; and then I have a Perl script to convert that file into an index useful for looking up recipes in.

Here’s the format I decided on for the import file. Headings are marked with a Markdown hash, and index entries are titled and tabbed just as they’ll be in the Nisus document.

# Recipes by name

## 1-9
1-2-3 Beef	a186
14 Carrot Cake (England)	a53
20 Minute Chocolate Sheet Cake	s119
24 Hour Lettuce Salad	a255
24 Hour Salad	a256,259

## A
All Season Shortcake	s177
Amazing Coconut Pie	s106
American-Style Enchiladas	a216
Angel Biscuits	s94
Angel Cake Supreme	a43
…
# Recipes by chapter

## Appetizers, Beverages & Miscellaneous

### A
Apple Pie Filling	s198
Aunt Nell’s Aig Nog	h6

### B
Bacon & Cheese Log	h3
Beef Dip	a8
Best Ever Grape Juice	h6
Bread & Butter Pickles	h192
…
## Cakes, Cookies & Desserts

### 1-9
14 Carrot Cake (England)	a53
20 Minute Chocolate Sheet Cake	s119
3 Layer Cake	h57
…

What I want to do is take this text file and read it into a document I’ve already formatted and made ready for this index. Each single-hash line is a section in that document. There’s a section for “Recipes by name”, a section for “Recipes by chapter”, and a section for “Recipes by author”. The macro doesn’t need to create these sections, because they’re already in the document. But it does need to find each headline and then move to—and empty out— the subsequent section. Each headline is followed by a two-column section that contains nothing but the index entries. Each time I run this script, I want the script to empty out those two-column sections and replace the entries with the new entries.

Each double-hash line is a level 2 headline, and each triple-hash line a level 3 headline, that gets inserted into the waiting section; each unhashed line is a Normal entry. The script needs to loop through each line in the data file, search for and empty out the appropriate section when it reaches a single-hash line, and then insert, with the appropriate style, the rest of the lines in that section.

[toggle code]

  • #read index file and insert into correct sections
  • If File.existsAtPath $dataFile
    • File.requireAccessAtPath $dataFile
    • $text = File.readStringFromPath $dataFile
    • ForEach $line in $text.split("\n")
      • If $line
        • If $line.hasPrefix '# '
          • ClearSection($line)
        • Else
          • If $line.hasPrefix '## '
            • insertTitle(3, $line)
            • Format:Paragraph Style:Heading 2
          • Elsif $line.hasPrefix '### '
            • insertTitle(4, $line)
            • Format:Paragraph Style:Heading 3
          • Else
            • insertLine($line)
          • End
          • $document.insertText "\n"
        • End
      • End
    • End
  • Else
    • Prompt "No Path $dataFile"
  • End

There’s a catch: because this is an automated process, I can’t eyeball index entries to handle extra-long entries in a custom fashion. If a line goes over the right edge, it wraps. That in itself isn’t necessarily a problem, but lines don’t format the same when they wrap. If the tab wraps with the line, the page number gets aligned flush right with other page numbers. If the tab does not wrap with the line, the page number is aligned left, because there’s no tab push it over to the right.

For that, I use this code:

[toggle code]

  • $document.insertText $line
  • #if the line wraps, make sure the page number is flush right
  • $endOfLine = TextSelection.activeRange
  • $lineRange = $document.text.rangeOfLineAtIndex $endOfLine.location
  • $lineCharacters = $document.text.subtextInRange $lineRange
  • If !$lineCharacters.containsString "\t"
    • $beginningOfLine = $lineRange
    • $beginningOfLine.length = 0
    • TextSelection.setActiveRange $beginningOfLine
    • $document.insertText "\t"
    • #inserted one character, so end of line has moved one over
    • $endOfLine.location = $endOfLine.location + 1
    • TextSelection.setActiveRange $endOfLine
  • End

If there is no text selected, as there won’t be immediately after an insert, the activeRange is a Range with a location and a length of zero. So $endOfLine is just the insertion point. The method rangeOfLineAtIndex provides a Range describing the text that is on the current line. If four characters have wrapped, this will be a Range of four characters. So $lineRange becomes that Range, and $lineCharacters becomes the text starting at that location and for that length. If those characters contain a tab character, everything’s good. This is either an unwrapped line, or a line that has wrapped but where the tab has also wrapped. In either case, the page number is flush right.

If the wrapped text does not include a tab, I add one. I change the insertion point to the beginning of the line, insert a tab, and then move the insertion point back. I have the insertion point saved in $endOfLine, but because I just added a tab, the end of the line’s location has just increased by one.

Here’s the full Nisus macro:

[toggle code]

  • $dataFolder = '~/Documents/Writing/Databases/Cookbooks'
  • $document = Document.active
  • Select Document Start
  • #determine data file corresponding to this book
  • If $document.displayName == "Saint Mary Cookbooks"
    • $dataFile = $dataFolder.filePathByAppendingComponent 'recipes.txt'
  • Else
    • Prompt "This macro is only appropriate for the Saint Mary Cookbooks index file."
    • Prompt $document.displayName
    • Exit
  • End
  • # back up the file
  • $documentPath = $document.filePath
  • $documentFolder = $documentPath.filePathByRemovingLastComponent
  • $backupPath = $documentPath.filePathByAppendingComponentNameSuffix " backup"
  • File.requireAccessAtPath $documentFolder
  • Save To $backupPath
  • #get the headline text from the markdown headline
  • Define Command GetHeadlineText($startAt, $line)
    • $endAt = $line.length - $startAt
    • $restOfLine =  Range.new $startAt, $endAt
    • $line = $line.substringInRange $restOfLine
    • return $line
  • End
  • Define Command InsertTitle($startAt, $line)
    • Import Var 'document'
    • $line = GetHeadlineText($startAt, $line)
    • $document.insertText $line
  • End
  • Define Command InsertLine($line)
    • Import Var 'document'
    • $document.insertText $line
    • Format:Paragraph Style:Normal
    • #if the line wraps, make sure the page number is flush right
    • $endOfLine = TextSelection.activeRange
    • $lineRange = $document.text.rangeOfLineAtIndex $endOfLine.location
    • $lineCharacters = $document.text.subtextInRange $lineRange
    • If !$lineCharacters.containsString "\t"
      • $beginningOfLine = $lineRange
      • $beginningOfLine.length = 0
      • TextSelection.setActiveRange $beginningOfLine
      • $document.insertText "\t"
      • #inserted one character, so end of line has moved one over
      • $endOfLine.location = $endOfLine.location + 1
      • TextSelection.setActiveRange $endOfLine
    • End
  • End
  • #Empty out the section whose level 1 headline is the provided markdown line
  • Define Command ClearSection($line)
    • Import Var 'document'
    • $headlineText = GetHeadlineText(2, $line)
    • $headlineStyle = $document.styleWithName "Heading 1"
    • #find first Heading 1 that is $headlineText
    • While $found = Find $headlineStyle, "-W"
      • $headline = $document.selection
      • If $headline.subtext.hasPrefix($headlineText)
        • Break
      • End
    • End
    • If !$found
      • Prompt "Unable to find section " & $headlineText
      • Exit
    • End
    • #move to the columned section after the headline
    • $documentText = $document.text
    • $selection = TextSelection.activeRange
    • $headlineSection = $documentText.sectionNumberAtIndex $selection.location
    • $section = $headlineSection
    • While $headlineSection == $section
      • Select Next Paragraph
      • $selection = TextSelection.activeRange
      • $section = $documentText.sectionNumberAtIndex $selection.location
    • End
    • #select all of the columned section…
    • Select Next Paragraph
    • $selection = TextSelection.activeRange
    • $sectionRange = $documentText.rangeOfSectionAtIndex $selection.location
    • $sectionSelection = TextSelection.new $documentText, $sectionRange
    • $document.setSelection $sectionSelection
    • #…except the section character
    • Adjust Selection Start by 1
    • #delete all text in this section
    • Insert Text ""
  • End
  • #read index file and insert into correct sections
  • If File.existsAtPath $dataFile
    • File.requireAccessAtPath $dataFile
    • $text = File.readStringFromPath $dataFile
    • ForEach $line in $text.split("\n")
      • If $line
        • If $line.hasPrefix '# '
          • ClearSection($line)
        • Else
          • If $line.hasPrefix '## '
            • insertTitle(3, $line)
            • Format:Paragraph Style:Heading 2
          • Elsif $line.hasPrefix '### '
            • insertTitle(4, $line)
            • Format:Paragraph Style:Heading 3
          • Else
            • insertLine($line)
          • End
          • $document.insertText "\n"
        • End
      • End
    • End
  • Else
    • Prompt "No Path $dataFile"
  • End

As you can see, it does a couple of other useful things. It verifies that I’m running this on an appropriate document. And it backs up the document before modifying it, just in case.

Also, the .rangeOfSectionAtIndex method returns the full section, including the character that sets this section off from the previous section. In most word processors, Nisus included, sections, page breaks, and column breaks are actual characters inserted in the text. You can delete the page break or column break, or the section separator, just like deleting a carriage return or any character. In this case, though, I don’t want to delete the section, just the text in the section, so I adjust the selection’s starting point by 1. I could just as well have altered the location property on $sectionRange up by one, but I would then have had to drop it’s length property by one as well.

  1. <- Catalina vs. Mojave
  2. Save clipboard script ->