{"url":"https://gepuro.hatenadiary.org/entry/20111014/1318610472","published":"2011-10-14 01:41:12","width":"100%","categories":["python","cabocha"],"blog_title":"gepuro\u306e\u65e5\u8a18","height":"190","provider_url":"https://hatena.blog","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Fgepuro.hatenadiary.org%2Fentry%2F20111014%2F1318610472\" title=\" CaboCha\u306b\u3088\u3063\u3066XML\u3067\u51fa\u529b\u3055\u308c\u305f\u30d5\u30a1\u30a4\u30eb\u3092\u30d1\u30fc\u30b9\u3059\u308b\u3002 - gepuro\u306e\u65e5\u8a18\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","description":"cabocha\u3092\u7528\u3044\u3066 $ cabocha -f 3 hoge.txt > hoge.xml \u3068\u3057\u3066\u51fa\u529b\u3055\u308c\u305fXML\u30d5\u30a1\u30a4\u30eb\u306f\u305d\u306e\u307e\u307e\u3067\u306f\u3001\u30d1\u30fc\u30b9\u3059\u308b\u4e8b\u304c\u3067\u304d\u306a\u3044\u3002\u305d\u306e\u305f\u3081\u3001\u4e00\u624b\u9593\u52a0\u3048\u3066\u3042\u3052\u308b\u5fc5\u8981\u304c\u3042\u308b\u3002\uff08\u4e8b\u524d\u306b\u3001\u4e00\u884c\u6bce\u306b\u6539\u884c\u3092\u3057\u3066\u3044\u308b\u5fc5\u8981\u3042\u308a\uff09 #!/usr/bin/python # -*- coding:utf-8 -*- import re p = re.compile(r'\".*?\"') def article(file): xml = open(file).readlines() sentenceid = 0 print \"<article>\" for line in xml: if\u2026","version":"1.0","provider_name":"Hatena Blog","title":" CaboCha\u306b\u3088\u3063\u3066XML\u3067\u51fa\u529b\u3055\u308c\u305f\u30d5\u30a1\u30a4\u30eb\u3092\u30d1\u30fc\u30b9\u3059\u308b\u3002","image_url":null,"type":"rich","blog_url":"https://gepuro.hatenadiary.org/","author_name":"gepuro","author_url":"https://blog.hatena.ne.jp/gepuro/"}