Skip to main content

Extracting and Query a Comprehensive Web Database - M. Cafarella, UWS

Tremendous amount of information lost takes place in existing web database that tries to fit crawled information into a specific domain / predefined domain. This is because information is 'forced' into a pre-defined domain.

This paper improves extraction model for web databases.

Architecture

Web crawl ---> Multimodel Extraction --> Entity Database --> Multimodel Transaction ---> User Query


This paper proposed the use of dynamic domain generation approach.

Challenges faced by this approach are

a) Web Extraction - Generating the E-R model is going to a challenge as different domain maybe generated for the same topic. For example George W Bush or President George Bush. How do you know that these are the same domain? Data reconciliation is a huge issue.

(Dong, A Halevy and Madhavan - Reference Reconciliation in Complex Information Space)
(Singla and Domingos - Entity Resolution with Markov logic)

b) Entity - Relation - A component that contains entity extracted from the web. Two method of query the system --

1. Structure Query - Query specific table / domain.

2. Unstructure Query - Query that span across mutiple table trying to find a matched criteria.

c) Query Processing - Interface that accepts user request and it supports both structured and unstructured query. Results are stored as on-the-fly table. It also takes in interaction from the user to make result more accurate.


Other related work

KnowItAll

TextRunner

WebTables

WeakAssoc

Comments

Popular posts from this blog

Android Programmatically apply style to your view

Applying style to your view (button in this case) dynamically is pretty easy. All you have to do is place the following in your layout folder (res/layout)
Let's call this file : buttonstyle.xml
<?xml version="1.0" encoding="utf-8"?> <selector xmlns:android="http://schemas.android.com/apk/res/android"> <item android:state_pressed="true" > <shape> <solid android:color="#449def" /> <stroke android:width="1dp" android:color="#2f6699" /> <corners android:radius="3dp" /> <padding android:left="10dp" android:top="10dp" android:right="10dp" android:bottom="10dp" /> </shape> </item> <item> <shape> <gradient android:startColor="#449def" a…

OpenCover code coverage for .Net Core

I know there are many post out there getting code coverage for .dotnetcore. I'm using opencover to address this needs.

In case, you do no want to use opencover and wanted to stick with vs2015 code coverage, you can try to copy Microsoft.VisualStudio.CodeCoverage.Shim.dll from C:\Program Files (x86)\Microsoft Visual Studio 14.0\Team Tools\Dynamic Code Coverage Tools\coreclr\ and drop it into your project "bin\Debug\netcoreapp1.0" folder.  Please note : you need to be on VS2015 Enterprise to do this. 

To get started, I guess we need to add OpenCover and ReportGenerator for our test projects, as shown in diagram below :-



When nuget packge gets restored, we will have some binaries downloaded to our machine and we going to use this to generate some statistics. I think the biggest issue is to getting those command lines work.

In dotnetcore, we run test project using "dotnet test" (assuming you are in the test project folder - if not please go there)  So we add this …

DataTable does not have AsEnumerable

I have problem locating my AsEnumerable extension method in my DataTabe (System.Data). Thank god for this post by Angel
(http://blogs.msdn.com/angelsb/archive/2007/02/23/does-not-contain-a-definition-for.aspx)

I was able to find this method once i have added reference to the following assembly.

C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5\System.Data.DataSetExtensions.dll

Try to do a dummy Build and you should be able to get it.