2010年10月27日 星期三

去除網頁標籤 tag

# Routine by Micah D. Cochran
# Submitted on 26 Aug 2005
# This routine is allowed to be put under any license Open Source (GPL, BSD, LGPL, etc.) License 
# or any Propriety License. Effectively this routine is in public domain. Please attribute where appropriate.
def strip_ml_tags(in_text):
  """Description: Removes all HTML/XML-like tags from the input text.
  Inputs: s --> string of text
  Outputs: text string without the tags
  
  # doctest unit testing framework
  
  >>> test_text = "Keep this Text  KEEP  123"
  >>> strip_ml_tags(test_text)
  'Keep this Text  KEEP  123'
  """
# convert in_text to a mutable object (e.g. list)
  s_list = list(in_text)
  i,j = 0,0

  while i < len(s_list):
    # iterate until a left-angle bracket is found
    if s_list[i] == '<':
      while s_list[i] != '>':
        # pop everything from the the left-angle bracket until the right-angle bracket
        s_list.pop(i)
      # pops the right-angle bracket, too
s_list.pop(i)
    else:
      i=i+1

  # convert the list back into text
  join_char=''
  return join_char.join(s_list)

def bbToHtmltags(html):
  pass

做留言版、討論區、BLOG、網頁相關時,時常用得到
Removes all HTML/XML-like tags from the input text.
Inputs: s --> string of text
Outputs: text string without the tags
文件說明已寫得很清楚
將網頁語法如  "Keep this Text  KEEP  123"
讓程式讀取後,去除 <> 內的文字,再做輸出成如下

2010年10月25日 星期一

學習Pyhon 的相關習藉

Python基礎教程 (第2版) 中文高清PDF版
Python學習手冊 第3版(Learning Python, 3rd Edition) 中文版PDF
Python核心編程 (第二版) 高清PDF中文版
Python核心編程 第二版 (Core Python Programming)