Quantcast
Channel: Active questions tagged email - Stack Overflow
Viewing all articles
Browse latest Browse all 29745

Error Extracting Outlook Email Data with Python

$
0
0

I have a Python script that uses os.walk and win32com.client to extract information from Outlook email files (.msg) from a folder and its subfolders on my C:/ drive. It appears to work, but when I try to do anything on the returned dataframe (such as emailData.head() Python crashes). I also cannot write the dataframe to .csv because of a permission error.

I'm wondering if my code is not properly closing outlook / each message and that is what is causing the problem? Any help would be appreciated.

import os
import win32com.client
import pandas as pd

# initialize Outlook client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

# set input directory (where the emails are) and output directory (where you
# would like the email data saved)
inputDir = 'C:/Users/.../myFolderPath'
outputDir = 'C:/Users/.../myOutputPath'


def emailDataCollection(inputDir,outputDir):
    """ This function loops through an input directory to find
    all '.msg' email files in all folders and subfolders in the
    directory, extracting information from the email into lists,
    then converting the lists to a Pandas dataframe before exporting
    to a '.csv' file in the output directory
    """
    # Initialize lists
    msg_Path = []
    msg_SenderName = []
    msg_SenderEmailAddress = []
    msg_SentOn = []
    msg_To = []
    msg_CC = []
    msg_BCC = []
    msg_Subject = []
    msg_Body = []
    msg_AttachmentCount = []

    # Loop through the directory
    for root, dirnames, filenames in os.walk(inputDir):
        for filename in filenames:
            if filename.endswith('.msg'): # check to see if the file is an email
                filepath = os.path.join(root,filename) # save the full filepath
                # Extract email data into lists
                msg = outlook.OpenSharedItem(filepath)
                msg_Path.append(filepath)
                msg_SenderName.append(msg.SenderName)
                msg_SenderEmailAddress.append(msg.SenderEmailAddress)
                msg_SentOn.append(msg.SentOn)
                msg_To.append(msg.To)
                msg_CC.append(msg.CC)
                msg_BCC.append(msg.BCC)
                msg_Subject.append(msg.Subject)
                msg_Body.append(msg.Body)
                msg_AttachmentCount.append(msg.Attachments.Count)
                del msg

    # Convert lists to Pandas dataframe
    emailData = pd.DataFrame({'Path' : msg_Path,
                          'SenderName' : msg_SenderName,
                          'SenderEmailAddress' : msg_SenderEmailAddress,
                          'SentOn' : msg_SentOn,
                          'To' : msg_To,
                          'CC' : msg_CC,
                          'BCC' : msg_BCC,
                          'Subject' : msg_Subject,
                          'Body' : msg_Body,
                          'AttachmentCount' : msg_AttachmentCount
    }, columns=['Path','SenderName','SenderEmailAddress','SentOn','To','CC',
            'BCC','Subject','Body','AttachmentCount'])


    return(emailData)


# Call the function
emailData = emailDataCollection(inputDir,outputDir)

# Causes Python to crash
emailData.head()
# Fails due to permission error
emailData.to_csv(outputDir,header=True,index=False)

Viewing all articles
Browse latest Browse all 29745

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>