Jack's Blog

流淌的心,怎能阻拦,吹来的风,又怎能阻挡。

实验二

一. 实验目的

1. 掌握自定义类的创建和使用等操作

2. 掌握 matplotlib 模块的使用

二. 实验内容

文件UN.txt中存放193个联合国成员信息,每行包括一个国家的名称、所在大洲、人口(百万)和面积(平方英里),例如:

Canada,North America,34.8,3855000
France,Europe,66.3,211209
New Zealand,Australia/Oceania,4.4,103738
Nigeria,Africa,177.2,356669
Pakistan,Asia,196.2,310403
Peru,South America,30.1,496226

(a) 创建一个Nation类包括四个实例变量存储国家信息和一个名为pop_density方法计算一个国家的人口密度。用这个类编写一个程序包含193个词条的字典。每个词条形式如下:

name of a country: Nation object for that country

用文件UN.txt创建这个字典,将这个字典保存到一个名为nationsDict.dat的永久二进制文件中,同时将Nation类保存到nation.py文件中。

(b) 利用nationsDict.dat和nation.py文件编写一个程序(search.py),输入联合国成员国名字,显示这个国家所有信息。如:

Enter a country: Canada
Continent: North America
Population: 34,800,000
Area: 3,855,000.00 square miles

(c) 利用nationsDict.dat和nation.py文件编写一个程序(sort.py),输入一个大洲的名字,按照降序使用 matplotlib 的柱状图功能画出该大洲人口密度前10名的联合国成员国名字及对应的人口密度。

将文件 nationsDict.dat,nation.py,search.py,sort.py 打包上传,压缩文件命名为:学号_姓名_实验2

nation.py

import pickle
class nation:
       def __init__(self, name='', continent='', pop='', area='', pop_density=''):
               self._name = name
               self._continent = continent
               self._pop = pop
               self._area = area
               self._pop_density =pop_density

       def setName(self, name):
               self._name = name

       def setContinent(self, continent):
               self._continent = continent

       def setPop(self, pop):
               self._pop = pop

       def setArea(self, area):
               self._area = area

       def getName(self, name):
               return self._name

       def getContinent(self, continent):
               return self._continent

       def getPop(self, pop):
               return self._pop

       def getArea(self, area):
               return self._area

       def pop_density(self):
               return (self._pop / self._area)
       def __str__(self):
               return ("The poplation density of" + str(self._name) + "is" + str(self.pop_density()))


f = open('UN.txt')
dict = {}
for line in f:
         words = line.split(",")
         _nation=nation(continent='', pop='', area='', pop_density='')

         _nation.setName(words[0])
         _nation.setContinent(words[1])
         _nation.setPop(words[2])
         _nation.setArea(words[3])
         dict[words[0]] = _nation

outfile = open("nationsDict.dat",'wb')
pickle.dump(dict,outfile)
outfile.close()

search.py

import pickle
import nation

def getDictionary(fileName):
    infile = open(fileName, 'rb')
    nations =pickle.load(infile)
    infile.close()
    return nations

def inputNameOfNation(nations):
    nation = input("Input a name of a UN member nation: ")
    while nation not in nations:
        print("Not a member of the UN.Please try again.")
        nation = input("Input a name of a UN member nation: ")

def displayData(nations,nation):
    print("Continent:", nations[nation]['continent'])
    print("Populaton:",nations[nation]['pop'], "million people")
    print("Area:",nations[nation]['area'],"square miles")

nations = getDictionary("nationsDict.dat")
nation = inputNameOfNation(nations)
displayData(nations,nation)

sort.py

import matplotlib.pyplot as plt
import nation
import pickle
def getDictionary(fileName):
    infile = open(fileName, 'rb')
    nations =pickle.load(infile)
    infile.close()
    return nations
nations = getDictionary("nationsDict.dict")
for i in nations[i]:
    nation.pop_density()

nation.pop_density.sort

plt.bar(data['x'], data['y'])


不建议任何人直接复制此代码

如果你是哈尔滨工业大学学生,请你一定不要复制此代码。因为此代码已被输入查重系统,一旦查重率超过20%,将会被认为抄袭。

此实验会为0分

Python--倒排索引

一.       实验目的

  1. 掌握列表、集合和字典的定义、赋值、使用等基本操作,熟悉处理复杂数据类型的一般流程
  2. 熟悉列表、集合和字典的常用函数和技巧
  3. 考察对文本的灵活处理和对排序算法的运用

二.       实验内容

倒排索引(Inverted index),也常被称为反向索引,是一种索引方法,用来存储某个单词存在于哪些文档之中。是信息检索系统中最常用的数据结构。通过倒排索引,可以根据单词快速获取包含这个单词的文档列表。

本实验主要完成以下三个功能:

(1). 建立索引:首先输入100行字符串,用于构建倒排索引,每行字符串由若干不含标点符号的、全部小写字母组成的单词构成,每个单词之间以空格分隔。依次读入每个单词,并组成一个由<单词, 每个单词出现的行号集合>构成的字典,其中行号从1开始计数。

(2). 打印索引:按照字母表顺序依次输出每个单词及其出现的位置,每个单词出现的位置则按行号升序输出。例如,如果“created”出现在第3, 20行,“dead”分别出现在14, 20, 22行。则输出结果如下(冒号和逗号后面都有一个空格,行号不重复):

created: 3, 20

dead: 14, 20, 22

(3). 检索:接下来输入查询(Query)字符串,每行包含一个查询,每个查询由若干关键字(Keywords)组成,每个关键字用空格分隔且全部为小写字母单词。要求输出包含全部单词行的行号(升序排列),每个查询输出一行。若某一关键字在全部行中从没出现过或没有一行字符串包含全部关键字,则输出“None”。遇到空行表示查询输入结束。如对于上面创建的索引,当查询为“created”时,输出为“3, 20”;当查询为“created dead”时,输出为“20”;当查询为“abcde dead”时,输出为“None”;

(4). 高级检索:当输入的Query以“AND:”开始,则执行“与”检索,即要求输出包含全部关键字的行;如果输入的Query以“OR:”开始,则执行“或”检索,即某行只要出现了一个关键字就满足条件。默认情况(不以“AND:”或“OR:”开始),执行“与”检索。

依次完成以上功能(提交程序命名:“学号_姓名_5.py”)


以下是代码

     #'''''Part 1 : Setup index'''  
      
    dict = {} # a emtry dictionary.  
    n = 100  
    for row in range(0,n):    
      
        information = raw_input()  
          
        line_words = information.split()   
        # split the information inputed into lines by '/n'  
      
        for word in line_words : # Judge every word in every lines .         
      
            # If the word appear first time .  
            if word not in dict :  
                item = set()   # set up a new set .  
                item.add(row+1)  # now rows  
                dict[word] = item   # Add now rows into keys(item).  
      
            # THe word have appeared before .  
            else:     
                dict[word].add(row+1)    # Add now rows into keys(item).  
      
    # print dict    we can get the information dictionary.  
      
                  
    '''''Part 2 : Print index'''   
      
    word_list = dict.items()  # Get dict's items .  
      
    word_list.sort( key = lambda items : items[0] ) # Sort by word in dict.  
      
    for word , row in word_list : # Ergodic word and row in word_list .  
          
        list_row = list(row)  
        list_row.sort()  
      
        # Change int row into string row .  
        for i in range ( 0 , len(list_row) ):  
            list_row[i] = str(list_row[i])  
          
        # print result the part 2 needed .  
        print word + ':' , ', '.join(list_row)  
      
      
    ''''' Part 3 : Query '''  
    # define judger to judger if all querys are in dict.  
    def judger(dict , query):  
        list_query = query.split()  
        for word in list_query :  
            if word not in dict :  
                return 0    # for every query ,if there is one not in dict,return 0  
        return 1   # all query in dict .  
      
    query_list = []   
      
    # for input , meet '' ,stop input.  
    while True:  
        query = raw_input()  
        if query == '' :  
            break     
        elif len(query) != 0 :  
            query_list.append(query) # append query inputed to a list query_list .  
      
      
    # Ergodic every query in query_list.       
    for list_query in query_list :  
          
        # if judger return 0.  
        if judger(dict , list_query) == 0 :  
            print 'None'  
        
        else:  
            list_query = list_query.split()  
            query_set = set()  # get a empty set  
              
            # union set to get rows .  
            for isquery in list_query :  
                query_set = query_set | dict[isquery]  
             
            # intersection to get common rows .  
            for isquery in list_query :  
                query_set = query_set & dict[isquery]  
               
            # if intersection == 0   
            if len(query_set) == 0 :  
                print 'None'  
      
            else:  
                query_result = list(query_set)  
                query_result.sort()  
                for m in range(len(query_result)) :  
                    query_result[m] = str(query_result[m])  
                  
                print ', '.join(query_result)  

for python 3

word_dic = {}
line_len = 3

# 建立索引
for i in range(line_len):
    line = input()
    words = line.split()
    for word in words:
        if word in word_dic:
            word_dic[word].add(i+1)
        else:
            word_set = set()
            word_set.add(i+1)
            word_dic[word] = word_set

#打印索引
word_list = []
for word,word_set in word_dic.items():
    word_list.append((word,word_set))

for word,line in word_list:
   list_line = list(line)
   list_line.sort()
   for i in range(len(list_line)):
    list_line[i] = str(list_line[i])

    print(word + ': ' , ', '.join(list_line))

#检索与高级索引
def judger(dict , query):
    list_query = query.split()
    for word in list_query :
        if word not in dict :
            return 0
    return 1

query_list = []
while True:
    query = input()
    if query == '' :
        break
    elif len(query) != 0 :
        query_list.append(query)

    if judger(dict , list_query) == 0 :
        print('None')
    else:
        list_query = list_query.split()
        query_set = set()  # get a empty set
        for isquery in list_query :
            query_set = query_set | dict[isquery]
        for isquery in list_query :
            query_set = query_set & dict[isquery]
        if len(query_set) == 0 :
            print('None')

        else:
            query_result = list(query_set)
            query_result.sort()
            for m in range(len(query_result)) :
                query_result[m] = str(query_result[m])

            print(', '.join(query_result))