给 Github 的 Markdown 文件生成目录

这是个坑,久久未填 ……

By Damnever on March 31, 2015

以前维护这份学(dai)习(xue)笔(lie)记(biao)的都是用手工来生成目录的,说来惭愧……

今天实在受不了了,天气变热,上课时老师让我们用给定的税率计算税钱 ……

Python.md添加东西的时候,发现都接近 2K 行了,Github也不支持自动生成目录。到时候如果又添加一个类别,岂不是眼睛都要找瞎,想想还是把这个坑填了吧!

通过编程提高生活品质!


首先必须保证你的标题style是这样的才能使用这个脚本,用#表示标题格式的还要添加id,破坏了原来的结构,小的做不来……

<h1 id="id1">H1 title</h1>
一级标题
<h2 id="id2">H2 title</h2>
二级标题
<h1 id="id11">H11 title</h1>
又是一级标题……

生成的目录格式如下,应该支持大多Markdown编辑器:

*   [H1 title](#id1)
    *    [H2 title](#id2)
*   [H11 title](#id11)

帮助文档如下:

*damnever->>> python toc_gen.py -h
usage: toc_gen.py [-h] [-S src] [-D des]

Generates TOC for markdown file.

optional arguments:
      -h, --help  show this help message and exit
      -S src      A path of source file.
      -D des      A file path to store TOC.

-S参数后接源Markdown文件路径,-D后面接要写入的文件路径,相对或绝对路径都行,不指定目的文件,直接打印在屏幕上。

好吧我又废话了,脚本源文件在这: toc_gen.py


主要用到了HTMLParserargparse

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
from __future__ import print_function

import os
import argparse
from HTMLParser import HTMLParser

def get_toc(html):

    toc_list = []

    class MyHTMLParser(HTMLParser):

        _prefix = ''
        _id = ''
        _title = ''

        def handle_starttag(self, tag, attrs):
            if tag[-1].isdigit():
                space = (int(tag[-1]) - 1) * 4
                self._prefix = space * ' ' + '*   '
            attrs = dict(attrs)
            if self._prefix and 'id' in attrs:
                self._id = '(#' + attrs['id'] + ')'

        def handle_data(self, data):
            if self._prefix:
                self._title = '[' + data.strip() + ']'
                toc_list.append(self._prefix + self._title + self._id)
            self._prefix = ''
            self._id = ''
            self._title = ''

    parser = MyHTMLParser()
    parser.feed(html)
    return '\n'.join(toc_list)

def read(fpath):
    with open(fpath, 'r') as f:
        data = f.read()
    return data

def write(fpath, toc):
    with open(fpath, 'w') as f:
        f.write(toc)

def parse_args():
    parser = argparse.ArgumentParser(
            description = "Generates TOC for markdown file.")
    parser.add_argument(
            '-S',
            type = file_check,
            default = None,
            help = "A path of source file.",
            metavar = 'src',
            dest = 'src')
    parser.add_argument(
            '-D',
            type = path_check,
            default = None,
            help = "A file path to store TOC.",
            metavar = 'des',
            dest = 'des')
    args = parser.parse_args()
    return args.src, args.des

def file_check(fpath):
    if os.path.isfile(fpath):
        return fpath
    raise argparse.ArgumentTypeError("Invalid source file path,"
            " {0} doesn't exists.".format(fpath))

def path_check(fpath):
    if fpath is None: return
    path = os.path.dirname(fpath)
    if os.path.exists(path):
        return fpath
    raise argparse.ArgumentTypeError("Invalid destination file path,"
            " {0} doesn't exists.".format(fpath))


def main():
    src, des = parse_args()
    toc = get_toc(read(src))
    if des:
        write(des, toc)
        print("TOC of '{0}' has been written to '{1}'".format(
                    os.path.abspath(src),
                    os.path.abspath(des)))
    else:
        print("TOC for '{0}':\n '{1}'".format(
                    os.path.abspath(src),
                    toc))

if __name__ == '__main__':
    main()