Extracting components of long file names into custom metadata properties

LouisaVenter

New Member
Good day,
I am an ECM Consultant for Datacentrix in South Africa. I have a migration-related question. I hope you can assist.
The issue is this:
1. The source system is a Windows shared drive
2. The target system is a document management solution
3. In the source system, the file names are very long. Most of them are more than 256 characters, thus making it impossible to migrate them to the target system
4. What we need to be able to do is to make the file names in the source system shorter by removing some components from the file names and then pulling them into custom advanced property fields that will be mapped to the metadata fields in the target system. An example of this type of file name is:
Hester Johanna Cornelia Catahrina Janse van Vuuren van Rensburg Winters Opperman van Oudshoorn ID 650909 5005 086.doc
I need the ID number to be the file name and all the name components to be custom metadata.
5. The migration tool we are using creates an MS Excel report of the content of the shared drive. This report is then used to do a data clean-up (i.e., making sure that date fields contain dates, not text, etc.) We can, however, not change any of the long file names on this MS Excel because the migration tool will not recognise the files again to enable them to be migrated.
6. Therefore, we need a method to reduce the file names in the source system while moving the excess information to custom advanced properties for each file.

Do you perhaps know how to do this?
 
Good day,

Indeed, you are dealing with quite a complex situation, but it's not impossible to manage. By using a script-based approach, you can extract components of long file names into custom metadata properties. Here's a simplified guide using PowerShell, an automation framework designed for Windows:

Before you start, please ensure you have a backup of your data as this process will change file names.

1. Open a PowerShell terminal.
2. Navigate to the folder that contains your files, using the "cd" command.

Then, we'll run a script that will shorten the file names and add the excess information to custom properties for each file:

```powershell
# Get the list of files in the folder
$files = Get-ChildItem -File

foreach ($file in $files){
# Extract the ID from the filename, you may need to adjust this based on your actual format
$id = $file.BaseName -replace '^.*ID (\d+).*$', '$1'
# Set the new filename to the ID
$newFileName = $id + $file.Extension

# Extract the name components
$nameComponents = $file.BaseName -replace 'ID \d+.*$', ''
$nameComponents = $nameComponents.Trim()

# Set the nameComponents into custom property
$shell = New-Object -ComObject Shell.Application
$folder = $shell.Namespace((Resolve-Path $file.DirectoryName))
$shellfile = $folder.ParseName($file.Name)
# '21' is generally for comments but you can choose a custom property based on your requirement
$folder.GetDetailsOf($shellfile, 21) = $nameComponents

# Rename the file
Rename-Item -Path $file.FullName -NewName $newFileName
}
```

This script extracts the ID from each filename and sets it as a new filename while trimming the original one. It also retrieves the remaining part of the original filename (the person's names) and stores it as a custom property for the file (using 'Comments' as an example field).

Please customize the PowerShell script according to your needs. Test it with a subset of files first, as this script will modify your original files. If the existing script doesn't provide the desired output, I would strongly recommend soliciting the assistance of a professional programmer to precisely tune the script to your needs.

Remember, this is just part of the process. Once this operation is completed, you'd need to check how these updated files with their metadata are migrated using your migration tool. It might need customization there as well to ensure the metadata follows along during the migration.
 
Back
Top