Power BI publisher for Excel (Preview)

With Microsoft Power BI publisher for Excel, you can take snapshots of your most important insights in Excel, like PivotTables, Charts, and ranges and pin them to dashboards in Power BI.

What can you pin? Just about anything in an Excel worksheet. You can select a range of cells from a simple sheet or table, a PivotTable or PivotChart, illustrations and images, text. What you can’t pin: You cannot pin 3D Maps or visualizations in Power View sheets. There are also some elements you can pin, but it wouldn’t make much sense to, like a Slicer or Timeline filter.

When you pin an element from Excel, a new tile is added to a new or existing dashboard in Power BI. The new tile is a snapshot, so it’s not dynamic, but you can still update it. For example, if you make a change to a PivotTable or Chart you’ve already pinned, the dashboard tile in Power BI isn’t updated automatically, but you can still update your pinned elements by using Pin Manager. You’ll learn more about Pin Manager below.

Download and install

Power BI publisher for Excel is an add-in you can download and install on desktop versions of Microsoft Excel 2010 with SP1 and later.

Once you have the publisher installed, you’ll see a new Power BI ribbon in Excel, where you can sign-in and out of Power BI, pin elements to dashboards, and manage elements you’ve already pinned.

The Power BI publisher for Excel add-in is enabled by default, but if for some reason you don’t see the Power BI ribbon tab in Excel, you’ll need to enable it. Click File > Options > Add-ins > COM Add-ins. Select Microsoft Power BI publisher for Excel.

Pin a range to a dashboard

You can select any range of cells from your worksheet and pin a snapshot to any number of existing or new dashboards in Power BI.

  1. In your worksheet, select a range, and then click Pin. If you’re not already signed into Power BI, you’ll be prompted to.
  2. In Pin to dashboard, select an existing dashboard or create a new one, and then click Pin.

Pin a Chart to a dashboard

Just click on the chart, and then click Pin .

Manage pinned elements

With Pin Manager, you can update (refresh) a pinned element’s associated tile in Power BI. You can also remove the pin between an element you’ve already pinned to dashboards in Power BI.

To update tiles in your dashboard, in Pin Manager, select one or more elements and then click Update.

To remove the mapping between a pinned element in Excel and the associated tile in a dashboard, click Remove. When you click Remove, you’re not removing the element from your worksheet in Excel or deleting the associated tile in the dashboard. You are removing the pin, or mapping, between them. The element will no longer appear in Pin Manager. If you pin the element again, it will appear as a new tile.

To remove a pinned element (a tile) from a dashboard, you’ll need to do that in Power BI. In the tile you want to delete, click the Open menu icon and then click Delete tile .

Contact us for PowerBI information at office365@atidan.com

Blog post content credit to the Microsoft PowerBI team!

 

Brainstorm Training for Microsoft Office 365 – QuickHelp Video Training On Demand

With the launch of Office 2016 and so many other new features in Office 365, Atidan is pleased to partner with Brainstorm to offer their QuickHelp On Demand training offering. Brainstorm QuickHelp is a subscription that gives all of your users access to video and reference help right within the applications such as Outlook, Word, Excel, Powerpoint, etc. for immediate help and training right when and where they need it. The subscription includes:

  • Thousands of short, on-demand training clips hosted in the cloud
  • Searchable content complete with course lists, favorites, and Quick Ribbon tips
  • Simultaneous see-and-do clips, delivered immediately

Please check out the enclosed video demo link:  Brainstorm QuickHelp Video Demonstration

Brainstorm Training Vision - Atidan

Videos are added very frequently as new products and features are released. Also included is a live monthly training called QuickCasts at no additional charge.

Initial subscription covers up to 50 users and is extremely cost effective. Additional subscriptions can be purchased individually and volume licensing is available. Considering the cost of training and your staff’s productivity , we believe this will pay for itself many times over!

Brainstorm QuickCast-Screenshots - Atidan

Why Quickcasts work:

  • Assisted help desk functionality to answer questions via videos
  • Customized course lists and the ability to add corporate content
  • Relevant reports on personal, company, and help desk usage

What’s in it for you

  • Improve user adoption, ease transition
  • Increase usage, increase productivity
  • Decrease help desk calls and costs
  • Increase software ROI

Another offering from Brainstorm are their ‘famous’ Quick Start reference cards. These are orderable in any quantity for any topic including Office 365, all of the Office 2016 products, Windows 10, etc.

QuickReference Cards Brainstorm - Atidan

Contact the Atidan team today at brainstorm@atidan.com !

Introducing Office 2016 – Takes the Work Out of Working Together

Atidan is pleased to announce the availability of Office 2016! Contact us for a free trial and no obligation briefing and demonstration.

The New Microsoft Office 2016 Video

Office 2016 Product Guide

The new Office—takes the work out of working together

Check out additional resources on our SlideShare site here

The new office is here 1 v2

Collaboration is the way we get things done in the workplace, but the process itself can be complex and frustrating. It shouldn’t have to be. We set out to make working together easier and more impactful by building a suite of integrated apps and services that removes barriers and empowers teams to do and achieve more.

Office has always been the go-to tool for helping individuals do their best work—whether for professional documents, powerful analyses or school presentations. People often start and end their work in Office, but there is often a messy middle that involves a lot of discussion—in person, by phone or via various tools—as well as multiple (sometimes conflicting) inputs. Today we are delivering a set of experiences that is built for making teamwork seamless.

  • We have had real-time co-authoring in all of our web apps since 2013. We are now taking it to the next level by building it directly into our native apps. With this release, we’re making co-authoring in Word real-time, which lets you see what others are writing immediately, as it happens. We’re committed to expanding real-time co-authoring to each of our native apps and you should expect to see more over time.
  • Skype for Business is now available in the client apps, allowing you to IM, screen share, talk or video chat right in your docs. This same experience will be coming to Office Online later this fall. Skype for Business also has faster screen sharing and now adds the option to start a real-time co-authoring session from any conversation or meeting.
  • Keep teams connected with Office 365 Groups, now available as part of Outlook 2016 and in a new Outlook Groups app on iOS, Android and Windows Phone. Office 365 Groups allows individuals to easily create public or private teams. Each group includes a shared inbox, calendar, cloud storage for group files, and a shared OneNote notebook to keep the team productive.
  • Office 365 Planner helps teams organize their work, with the ability to create new plans, organize and assign tasks, set due dates and update status. Planner’s visual dashboards and email notifications help keep everyone informed on the overall progress of their initiative. Planner will be available in preview, to Office 365 First Release customers, starting next quarter.
  • Originally unveiled earlier this year, GigJam is today available in private preview and will become part of Office 365 in 2016. GigJam is an unprecedented new way for teams to accomplish tasks and transform business processes by breaking down the barriers between devices, apps and people.

Works for you

Excel 2016 - New Sunburst Chart

In today’s workplace, we have too many apps, devices and tools—we just don’t have enough time. Office is built to cut through the complexity and help you get things done quickly. It gets work done for you.

Some highlights:

  • Outlook 2016 provides the smartest inbox yet, delivering lightning fast search, removing low priority mail automatically and making sure everyone on the To: line has the right access to modern, cloud-based attachments from OneDrive.
  • We also have significant new updates to OneDrive for Business coming later this month across sync, browser, mobile, IT control and developer experiences. The highlight is the preview of the next generation sync client for Windows and Mac, offering improved reliability and selective sync, as well as increased file size and volume limits.
  • Tell Me helps you easily and quickly find the right Office feature or command, and Smart Lookup brings insights from the web right into your documents. Previously available only in Office Online, Tell Me and Smart Lookup are now available across the Office 2016 client apps.
  • Excel 2016 now includes integrated publishing to Power BI and new modern chart-types to help you make the most of your data.
  • The cloud-powered most recently used documents list allows you to pick up right where you left off in seconds, because it travels with you across your devices whether working in Office Online, the mobile apps or in the 2016 client apps.

Perfect with Windows 10

Office Mobile Apps

While we have made a lot of progress on delivering Office cross-platform and remain committed to this strategy, we see Windows as being “home” for Office. Together, Office 365 and Windows 10 are the most complete solution for getting things done.

  • With Windows Hello, you can sign in to your PC and Office 365 with a simple look or a touch—no need to type a password anymore.
  • In the coming months, Cortana will get even smarter and more useful by gathering intelligence on Office 365 through Outlook 2016 support.
  • With the Office Mobile apps and Continuum on Windows phones, your phone can act as a desktop, and you can project, create or edit your presentation or sales budget from your phone to a monitor while using the same phone to take notes with the OneNote app.
  • Sway on Windows 10 was released in early August, and customers love it! It has received a 4.6/5 app store ranking, with customers calling it “funtastic,” “brilliant” and “elegant.” People love Sway’s simple and intuitive yet powerful approach to next-generation digital storytelling. With Sway, you provide the content, and Sway creates beautiful, shareable, interactive stories that look great on any screen.
  • Our Office Mobile apps on Windows 10 (Word, Excel and PowerPoint) have also received great reception and 4+ app store rankings. Customers say they are touch-friendly, fast and easy to use—making them wonderful for on-the-go-productivity.

For the enterprise

Office 2016 apps with Office 365 provide the most secure Office yet. We are adding built-in Data Loss Prevention (DLP) to significantly reduce the risk of leaking sensitive data by giving IT admins tools to centrally create, manage and enforce policies for content authoring and document sharing. Multifactor Authentication ensures secure access to content anywhere when employees are away from the corporate network. We are also adding Information Rights Management to Visio. Then, later this year, we will be enabling Enterprise Data Protection (EDP) in Windows 10, with support in Office Mobile, which allows more secure corporate content sharing across corporate managed apps and network/cloud locations, preventing inadvertent content sharing outside corporate boundaries. We will be following up with EDP for Office on Windows desktop in early 2016.

Some of our favorite enterprise apps—Visio and Project—have been updated as well. With Visio 2016, customers can get started quickly with diagramming using starter diagrams and contextual tips. Bringing process models, manufacturing plant or IT architecture to life is now only one step away. With Project 2016, customers can streamline resource engagement processes, manage resource pools through visual heat maps, benefit from multiple timelines, and create custom experiences in Project desktop through write-back capabilities for add-ins.

We are also making deployment easier and have a lot of other new capabilities to help IT professionals, as announced in our blog to IT admins two weeks ago.

Office 2016 for Mac

The new office is here 4 v2

In July, we released Office 2016 for Mac to our Office 365 customers. The new versions of Word, Excel, PowerPoint, Outlook and OneNote are a significant step forward and provide the best of both worlds for Mac users—a familiar Office experience paired with the best of the Mac platform. The level of engagement and excitement about this Office for Mac release has surpassed our expectations. As promised in July, today Office 2016 for Mac will also be available as a one-time purchase.

Ongoing Office 365 customer value

Whew—that was a lot! With this release, we’re also shifting the cadence of Office on Windows to feel much more like the cadence we have on the Office mobile and web apps, which release every month with new value. So, going forward, Office 365 customers will now enjoy new features and capabilities delivered continuously in the Office desktop applications as part of their subscription. It’s a new day for our desktop apps.

And as a bit of a teaser, here are some examples of the types of new features and benefits Office 365 customers will see:

  • Built for teamwork—We’re committed to rich co-authoring and collaboration across our native clients, starting in Word 2016, with other big advances in collaboration coming throughout the year. By the end of 2015, we’ll introduce Office 365 Groups insights and discovery in Office Delve. In addition, we will have a new generation of personal work analytics in Office Delve that will help individuals, teams and organizations to be more effective at work by understanding their reach and impact, time allocation and network.
  • Works for you—For those of you who are excited about our new charts and forecasting, we will be delivering even more new charts, formulas, connectors and other Excel capabilities throughout the year. (If you’re interested in telling us which charts and charting features you’d like to see first, we have a survey going on our Excel Facebook page right now!). We also have some very exciting developments in the pipeline for PowerPoint that will make it easier to design and deliver presentations to wow and engage your audience.
  • Perfect on Windows 10—We are especially excited about how even more valuable your personal assistant Cortana can be when she has insights from Office. This will start to allow Cortana to unlock helpful scenarios, like retrieving your documents, preparing for meetings and more.

Get started

  • Are you ready to purchase? Buy Office 365 to get the new 2016 apps – contact us at office365@atidan.com
  • Are you an existing Office 365 customer who is ready to upgrade? Go here for instructions on how to upgrade to Office 2016 for home, or here for business customers.
  • Do you want to learn more? Go to office.com/2016.

Credit for this blog content is from Microsoft blog https://blogs.office.com/2015/09/22/thenewoffice/

Microsoft Power BI Launched – Bring Your Data to Life

Atidan is pleased to be a part of the business intelligence revolution – for over ten years we have worked with powerful visualization tools starting with the early versions of SQL Server and many third party tools. Now, it is even easier than ever to collect data from almost ‘anywhere’, create ‘stories’ from your data, create insights with simple drag-and-drop gestures, and share your dashboards and metrics internally and externally. Contact us today to see the latest release of Microsoft Power BI.
Microsoft Power BI is a collection of online services and features that enables you to find and visualize data, share discoveries, and collaborate in intuitive new ways. There are two experiences now available for Power BI: the new experience, generally referred to as Power BI, and the previous experience which is referred to as Power BI for Office 365.Power BI

The new experience is centered on PowerBI.com, an online service where you can quickly create dashboards, share reports, and directly connect to (and incorporate) all the data that’s important to you. The new experience also introduces the Power BI Desktop, a dedicated report authoring tool that enables you to transform data, create powerful reports and visualizations, and easily publish to Power BI. The new experience extends to all your mobile devices, too.

Note   If you were using the previous experience and want to migrate to the new experience, you can download this migration guide to understand and prepare for the migration process.

The following sections provide an overview of the new experience, along with introductions and links to learn more about the details and capabilities of each offering.

Power BI – the new experience

In the new experience, Power BI displays dashboards on the Power BI service that are interactive, and can be created and updated from many different data sources. Three elements are primary to the new experience:

In the new experience, you create dashboards that keep you informed about what’s most important about your business. Just like a dashboard in your car displays important information about your vehicle, such as its speed, its fuel level, or how healthy the engine is, dashboard in Power BI display important information about your business.

Dashboard in Power BIIn Power BI, dashboards display tiles that represent important information about your business. Tiles are based on reports (just like gauges are based on vehicle data from the engine, the fuel tank, or the battery). You can select a tile to explore more information from the underlying report.

Power BI - dashboard to reportThe new experience also introduces a dedicated report authoring tool:

With Power BI Desktop, you get a powerful and dedicated report authoring tool that enables you to connect to and combine data from lots of different sources, using Power BI Desktop’s Query Editor. From the datasets you build with Query Editor you can create rich reports and visualization within Power BI Desktop. And when you’re done, publishing to the Power BI service is easy.

Power BI DesktopThere’s a lot more to learn about the new experience, and plenty of content to help you . The following links provide information about the new experience, and start with overview information (to get your familiar, oriented, and comfortable) then move into specifics. A separate section is dedicated to the Power BI Desktop, and those links also go from overview to specifics.

Getting Started – the new experience

The following links provide guidance on the new experience in Power BI.

Using Power BI Desktop

The following links start with an overview of Power BI Desktop, then describe common tasks and capabilities that Power BI Desktop provides.

Microsoft Office 2016 for Apple Mac Launched

There’s a lot to love in the new Office for Mac

Microsoft announced that a totally redesigned and rewritten version of Office is now available for the Macintosh for all Office 365 subscribers who have the Office subscription (E3, Business, Business Premium, etc.). Including Word, Excel, PowerPoint, Outlook and OneNote, every application now has a metro interface and many new features.  Please contact the Atidan team to learn more at office365@atidan.com
A MacBook showing a new Word for Mac document with the navigation pane.

Word

Create, polish, and share beautiful and professional documents
The state-of-the-art authoring and reviewing tools in Word make the creation of polished documents easy. The new Insights pane shows relevant contextual information from the web inside Word.
The Design tab enables you to manage layout, colors, and fonts across a document.
Get more done by working together using built-in tools to share and review documents. Several people can work on the same document simultaneously and use threaded comments to have a conversation right next to relevant text.

Excel

Analyze and visualize numbers in new and intuitive ways
The new Excel for Mac enables you to turn numbers into insights. Familiar keyboard shortcuts and data entry enhancements like formula builder and autocomplete immediately make you more productive.
Excel also helps you visualize your data by recommending charts best suited for your numbers, and letting you quickly preview the different options. New PivotTable Slicers help you discover patterns in large volumes of data.
A MacBook showing a new Excel for Mac spreadsheet with charts.
A MacBook showing a new PowerPoint for Mac presentation.

PowerPoint

Create multimedia presentations and present your ideas with confidence
Walk into your next presentation with complete confidence. The new Presenter View in PowerPoint displays the current slide, next slide, speaker notes, and a timer on your Mac, while projecting only the presentation to your audience on the big screen.
A new Animation pane helps you design and fine-tune animations, and refined slide transitions ensure the finished product is polished.
Easily share your presentation and invite others to work on the same presentation simultaneously.

OneNote

Harness your thoughts in your very own digital notebook
Capture, organize, and share your ideas with digital notebooks that you can access on any device. Find things quickly with a powerful search engine that tracks your tags, indexes your typed notes, and recognizes text in images and handwritten notes.
Bold, italicize, underline, highlight, insert files, pictures and tables—format your notes like you want.
Easily share notebooks with friends, family, or colleagues so everyone can work together on travel plans, household tasks, or work projects.
A MacBook showing a notebook open in the new OneNote for Mac.
A MacBook showing an inbox in the new Outlook for Mac.

Outlook

Stay organized with an email and calendar experience that’s fast and looks great
Managing your email, calendar, contacts, and tasks has never been easier. The new Outlook for Mac has push mail support so your inbox is always up to date.
The improved conversation view automatically organizes your inbox around threaded conversations, so you’ll never hunt for related messages again. And the new message preview gives you the first sentence of an email just below the subject line, so you can quickly decide if you want to read it now or come back later.

Power Map for Excel–February update for Office 365

New Power Map for Excel Features

Data cards–Tooltips the way you want them!

Data cards extends the Power Map tooltip to let you display more contextual information about the data on your map. This becomes powerful when you need to quickly drill down into the details behind the columns and pie chart visualizations, like showing a list of Olympic events and associated medal count spanning a time period for a specific country. Just hover over or click any visualization on the map and a data card is displayed with easily readable information about the geo, height, category and time related to the data point.

Power Map Feb 1

You can also customize the data cards to fit your data exploration and presentation needs. Some of the options available are:

  • Add/remove/change aggregation of data fields
  • Add a title (e.g. location name)
  • Re-order the data fields
  • Rename the data field headers
  • Select from a set of layout templates

Below is an example of a customized data card. Notice that event a series of data (from aggregate values) can be displayed, allowing you to drill down into the more interesting details of the data on your map.

Power Map Feb 2

Learn more about how to create and customize data cards and let us know what you think about this new feature.

Heat map improvements

Using heat map visualizations in Power Map tours is great way to see trends and outliers in your data through a color scale. Below is a tour scene with traditional heat map visualizations. This month’s update comes with a few cool new ways you can use heat maps to represent the data just the way you want.

Changing aggregation type

The traditional heat map only showed sum data. Now you can use heat maps to represent other aggregation types, like average. By changing the aggregation type, an option in the Layer Options tab, we were able to display the average cost of new construction projects in the greater Seattle Area.

Power Map Feb 3

Power Map Feb 4

Customizing the color scale

The traditional heat map displayed points with a color scale ranging from blue to green, yellow and red. To better represent your data, it may be more effective to display your data across just a portion of this color scale or choose an entirely different color scale all together. Also under the Layer Options tab, you will find an option to change the color of your heat map visualizations. Select Custom in the drop down and then just choose the colors you want to include. Using the same tour scene, we changed the heat map colors to just blue and red.

Power Map Feb 5

With these new customizations, what you can do with heat map visualizations is really limited to the data you have. Imagine an accumulated image of precipitation measurements of over 30 years can be made to look like with these options in Power Map.

Power Map Feb 6

 


Power Map for Excel is available with any Office 365 subscription that includes Office desktop apps. To use Power Map, open Excel and go to Insert > Map.

If you have automatic updates enabled for your Office 365 subscription, you should receive this Power Map update soon. If you don’t have automatic updates enabled in Excel, go to File > Account > Update Options > Update Now.

Learn about all the powerful analytics and visualization features in Excel and take your analysis further by sharing and collaborating on business insights with colleagues using Power BI.

Fuzzy LookUp for Excel – Prelease Download – Identify Textually Similar Data

Microsoft Fuzzy Lookup Add-In for Excel

A challenging problem in data management is that the same entity may be represented in multiple ways throughout the dataset. For instance, customer “Andy Hill” might also be present as “Mr. Andrew Hill” or “Hill, Andrew R.”. Variations can result from merging independent data sources, spelling mistakes, inconsistent naming conventions and abbreviations, or records with additional/missing information.

Microsoft Fuzzy Lookup technology, developed by Microsoft Research, allows you to quickly identify data records which are textually similar. You can identify fuzzy duplicates within a single table or perform a fuzzy join between two different tables. The default configuration works well for a wide variety of data, but the matching may also be customized for specific domains.

Fuzzy Lookup for Excel Beta - Atidan

Installation

System Requirements

The add-in works on any Microsoft Windows operating system with Microsoft Excel 2010 or newer installed.

Installation Steps

Close any open instances of Excel. Uninstall any existing versions of Microsoft Fuzzy Lookup Add-in For Excel using the Windows control panel. Navigate to http://www.microsoft.com/en-us/download/details.aspx?id=15011, then download and run Setup.exe to launch the installation wizard.

If Setup.exe is run with administrator privileges, it will be installed for all users on the machine. It will be installed only for the current user otherwise.

Once successfully installed, launch Excel and you should see a Fuzzy Lookup tab in the Excel Ribbon bar.

See Portfolio Sample section for an introduction on how to use the add-in.

Installation Folder

The add-in, documentation and samples will be installed to the folder:

If the installer is run without administrator privileges,

%LOCALAPPDATA%\Microsoft\Fuzzy Lookup Add-In For Excel\         %LOCALAPPDATA% is typically C:\Users\%USERNAME%\AppData\Local

If the installer is run as administrator on a 32-bit operating system,

%ProgramFiles%\Microsoft\Fuzzy Lookup Add-In For Excel\         %ProgramFiles% is typically C:\Program Files

If the installer is run as administrator on a 64-bit operating system,

%ProgramFiles(x86)%\Microsoft\Fuzzy Lookup Add-In For Excel\         %ProgramFiles(x86)% is typically C:\Program Files (x86)

Portfolio Sample

This section describes how to use the Fuzzy Lookup Add-In for Excel with the spreadsheet Portfolio.xlsx which is located in the installation folder.

Imagine you have a stock portfolio described by two columns Company and Shares and that you are interested in computing the average price/earnings (P/E) ratio of the companies in the portfolio. To do this, you need to join your portfolio table with another table containing P/E ratios. The spreadsheet contains a second tab called SP500 which contains company data imported from the stock screener at the http://finviz.com website. Looking at the data, one can see that an exact join on the Company columns of the two tables would fail as the string representations of the companies differ (e.g., “AMAZON COM INC STK” and “Amazon.com Inc.”).

A fuzzy join of the two tables can be performed as follows:

  1. Turn the each data range into an Excel table by selecting a region and pressing CTRL-L. You can assign a name to the table clicking on it and selecting the Design tab in the Excel ribbon.
  2. Open the Fuzzy Lookup pane by clicking on the Fuzzy Lookup button in the Fuzzy Lookup tab of the Excel ribbon.
  3. Pick the left and right tables from the drop down menus. Matching rows from the right table will be returned for each row in the left table.
  4. Select the columns to match on. If the two tables share one or more column names in common, a default join will already have been added. If you wish to match on different columns, first delete the existing join by pressing the “X” button on the join row in the Match Columns table. To create a new column binding, select one or more columns from each table (multiple columns may be selected by holding down SHIFT or CTRL and click on the column names). Next, press the button in between the two lists of columns to add a row to the Match Columns table.
  5. Select one or more output columns to be output for each match.
  6. Select the maximum number of matches to be returned for each left row.
  7. Set the similarity threshold. All matches returned must have a similarity greater than or equal to this value.
  8. Move the current cell selected in the Excel spreadsheet to an empty cell which has empty space to the right and below it. The Fuzzy Lookup matches will be output starting at this cell.
  9. Press the “Go” button to perform the match.

One should see the results as indicated in the screenshot above. Notice that each returned match includes a similarity score indicating how close the two records are. 1.0 means an exact match while lower scores indicate less similarity.

Note that Fuzzy Lookup can also be used to identify matches in a single table by setting the left and right tables to be the same.

Advanced Concepts

Fuzzy Lookup technology is based upon a very simple, yet flexible measure of similarity between two records.

Jaccard similarity

Fuzzy Lookup uses Jaccard similarity, which is defined as the size of the set intersection divided by the size of the set union for two sets of objects. For example, the sets {a, b, c} and {a, c, d} have a Jaccard similarity of 2/4 = 0.5 because the intersection is {a, c} and the union is {a, b, c, d}. The more that the two sets have in common, the closer the Jaccard similarity will be to 1.0.

Weighted Jaccard similarity and tokenization of records

With Fuzzy Lookup, you can assign weights to each item in a set and define the weighted Jaccard similarity as the total weight of the intersection divided by the total weight of the union. For the weighted sets {(a, 2), (b, 5), (c, 3)}, {(a, 2), (c, 3), (d, 7)}, the weighted Jaccard similariyt is (2 + 3)/(2 + 3 + 5 +7) = 5/17 = .294.

Because Jaccard similarity is defined over sets, Fuzzy Lookup must first convert data records to sets before it calculates the Jaccard similarity. Fuzzy Lookup converts the data to sets using a Tokenizer. For example, the record {“Jesper Aaberg”, “4567 Main Street”} might be tokenized into the set, {“ Jesper”, “Aaberg”, “4567”, “Main”, “Street”}. The default tokenizer is for English text, but one may change the LocaleId property in Configure=>Global Settings to specify tokenizers for other languages.

Token weighting

Because not all tokens are of equal importance, Fuzzy Lookup assigns weights to tokens. Tokens are assigned high weights if they occur infrequently in a sample of records and low weights if they occur frequently. For example, frequent words such as “Corporation” might be given lower weight, while less frequent words such as “Abracadabra” might be given a higher weight. One may override the default token weights by supplying their own table of token weights.

Transformations

Transformations greatly increase the power of Jaccard similarity by allowing tokens to be converted from one string to another. For instance, one might know that the name “Bob” can be converted to “Robert”; that “USA” is the same as “United States”; or that “Missispi” is a misspelling of “Mississippi”. There are many classes of such transformations that Fuzzy Lookup handles automatically such as spelling mistakes (using Edit Transformations described below), string prefixes, and string merge/split operations. You can also specify a table containing your own custom transformations.

Jaccard similarity under transformations

The Jaccard similarity under transformations is the maximum Jaccard similarity between any two transformations of each set. Given a set of transformation rules, all possible transformations of the set are considered. For example, for the sets {a, b, c} and {a, c, d} and the transformation rules {b=>d, d=>e}, the Jaccard similarity is computed as follows: Variations of {a, b, c}: {a, b, c}, {a, d, c} Variations of {a, c, d}: {a, c, d}, {a, c, e} Maximum Jaccard similarity between all pairs: J({a, b, c}, {a, c, d}) = 2/4 = 0.5 J({a, b, c}, {a, c, e}) = 2/4 = 0.5 J({a, d, c}, {a, c, d}) = 3/3 = 1.0 J({a, d, c}, {a, c, e}) = 2/4 = 0.5 The maximum is 1.0. Note: Weghted Jaccard similiary under transformations is simply the maximum weighted Jaccard similarity across all pairs of transformed sets.

Edit distance

Edit distance is the total number of character insertions, deletions, or substitutions that it takes to convert one string to another. For example, the edit distance between “misissipi” and “mississippi” is 2 because two character insertions are required. One of the transformation providers that’s included with Fuzzy Lookup is the EditTransformationProvider, which generates specific transformations for each input record and creates a transformation from the token to all words in its dictionary that are within a given edit distance. The normalized edit distance is the edit distance divided by the length of the input string. In the previous example, the normalized edit distance is 2/9 = .222.

Technical resources

For more technical details on Fuzzy Lookup, see the following resources:

Microsoft Research Data Cleaning Project

Transformation-based Framework for Record Matching

Efficient Exact Set-Similarity Joins

Help and Support

The Fuzzy Lookup Add-in for Excel is pre-release software and no support is officially provided by Microsoft.

A forum for questions about the add-in is available here: http://blogs.msdn.com/b/business_intelligence_labs/archive/2011/04/27/fuzzy-lookup-add-in-for-excel.aspx

Questions and comments may also be sent to dcrt@microsoft.com

This document is provided “as-is”.  Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. Some examples depicted herein are provided for illustration only and are fictitious.  No real association or connection is intended or should be inferred.  This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.

© 2005-2014 Microsoft. All rights reserved.