SQL: How to remove duplicate rows from a table in SQL Server

How to remove duplicate rows from a table in SQL Server

http://support.microsoft.com/kb/139444

For this example, we will use the following table with duplicate PK values. In this table the primary key is the two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint since two rows have duplicate PKs. This procedure illustrates how to identify and remove the duplicates.

create table t1(col1 int, col2 int, col3 char(50))

insert into t1 values (1, 1, ‘data value one’)

insert into t1 values (1, 1, ‘data value one’)

insert into t1 values (1, 2, ‘data value two’)

The first step is to identify which rows have duplicate primary key values:

SELECT col1, col2, count(*)

FROM t1

GROUP BY col1, col2

HAVING count(*) > 1

This will return one row for each set of duplicate PK values in the table. The last column in this result is the number of duplicates for the particular PK value.

Collapse this tableExpand this table

col1 col2
1 1 2

If there are only a few sets of duplicate PK values, the best procedure is to delete these manually on an individual basis. For example:

set rowcount 1

delete from t1

where col1=1 and col2=1

The rowcount value should be n-1 the number of duplicates for a given key value. In this example, there are 2 duplicates so rowcount is set to 1. The col1/col2 values are taken from the above GROUP BY query result. If the GROUP BY query returns multiple rows, the "set rowcount" query will have to be run once for each of these rows. Each time it is run, set rowcount to n-1 the number of duplicates of the particular PK value.

Before deleting the rows, you should verify that the entire row is duplicate. While unlikely, it is possible that the PK values are duplicate, yet the row as a whole is not. An example of this would be a table with Social Security Number as the primary key, and having two different people (or rows) with the same number, each having unique attributes. In such a case whatever malfunction caused the duplicate key may have also caused valid unique data to be placed in the row. This data should copied out and preserved for study and possible reconciliation prior to deleting the data.

If there are many distinct sets of duplicate PK values in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:

1. First, run the above GROUP BY query to determine how many sets of duplicate PK values exist, and the count of duplicates for each set.

2. Select the duplicate key values into a holding table. For example:

3. SELECT col1, col2, col3=count(*)

4. INTO holdkey

5. FROM t1

6. GROUP BY col1, col2

7. HAVING count(*) > 1

8. Select the duplicate rows into a holding table, eliminating duplicates in the process. For example:

9. SELECT DISTINCT t1.*

10. INTO holddups

11. FROM t1, holdkey

12. WHERE t1.col1 = holdkey.col1

13. AND t1.col2 = holdkey.col2

14. At this point, the holddups table should have unique PKs, however, this will not be the case if t1 had duplicate PKs, yet unique rows (as in the SSN example above). Verify that each key in holddups is unique, and that you do not have duplicate keys, yet unique rows. If so, you must stop here and reconcile which of the rows you wish to keep for a given duplicate key value. For example, the query:

15. SELECT col1, col2, count(*)

16. FROM holddups

17. GROUP BY col1, col2

should return a count of 1 for each row. If yes, proceed to step 5 below. If no, you have duplicate keys, yet unique rows, and need to decide which rows to save. This will usually entail either discarding a row, or creating a new unique key value for this row. Take one of these two steps for each such duplicate PK in the holddups table.

18. Delete the duplicate rows from the original table. For example:

19. DELETE t1

20. FROM t1, holdkey

21. WHERE t1.col1 = holdkey.col1

22. AND t1.col2 = holdkey.col2

23. Put the unique rows back in the original table. For example:

24. INSERT t1 SELECT * FROM holddups

Advertisements

SQL Server: Quickest Method to Create Single Table Backup

http://connectsql.blogspot.in/search/label/BCP

SQL Server: Quickest Method to Create Single Table Backup

There are several ways to create backup for a single table in SQL Server database. In SQL Server 2008, now you can create insert statements for selected tables. Beside this Export wizard is commonly used to create a flat file backup for a single table. Both methods are reliable but still time consuming. And when I searched for quickest method to take backup of a single table I found following code by SQL Expert Fred.

USE AdventureWorks

GO

DECLARE @table VARCHAR(128),

@file VARCHAR(255),

@cmd VARCHAR(512)

SQL: Tips For Lightning-Fast Insert Performance On SQL Server

http://arnosoftwaredev.blogspot.in/2011/10/tips-for-lightning-fast-insert.html

Tips For Lightning-Fast Insert Performance On SQL Server

1. Increase ADO.NET BatchSize to eliminate unnecessary network roundtrips, e.g. SqlDataAdapter.UpdateBatchSize when working with DataAdapters, or SqlBulkCopy.BatchSize when applying SqlBulkCopy.

2. Limit the number of indices on the target table to what is really essential for query performance.

3. Place indices on master table columns referenced by the target table’s foreign keys.

4. Choose the target table’s clustered index wisely, so that inserts won’t lead to clustered index node splits. Usually an identity column (AKA "autoinc") is a good choice. If you don’t have autoinc primary keys, consider introducing a new identity column just for the sake of making it your clustered index.

5. Let the client insert into a temporary heap table first (that is, a table that has no clustered index, resp. no index at all). Then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table. The "insert-into-select"-statement must contain an "order-by"-clause which guarantees ordering by clustered index.

6. Apply SqlBulkCopy.

7. Decrease transaction logging by choosing bulk-logged recovery model, resp. setting SqlServer traceflag 610.

8. If your business scenario allows for it, place a table lock before inserting. This will make any further locking unnecessary, and is especially a viable option on staging tables as described in (5). SqlBulkCopy also supports table locks via SqlBulkCopyOptions.

9. Place database datafile and logfile on two physically separated devices, e.g. on two disks or two SAN LUNs configured for different spindles.

10. Prefer server-side processsing (e.g. by means of "insert-into-select") to client-to-server-roundtrips wherever possible.

11. This is probably the fastest insert-approach I have ever heard of (taken from this sqlbi whitepaper, see final paragraph): Create a new heap table just for the current insert batch, SqlBulk-Copy data into that table, then create a suited clustered index on the table, and add the table as a new table partition to an existing partitioned table.

12. Check execution plan when inserting, and go sure it does not contain anything unexpected or dispensable that might slow down your inserts, e.g. UDF-calls during check constraint execution, heavyweight trigger code, referential integrity checks without index usage or indexed view updates.

SQL Server 2005: Immediate Deadlock notifications (example)

http://weblogs.sqlteam.com/mladenp/archive/2008/05/21/SQL-Server-2005-Immediate-Deadlock-notifications.aspx

SQL Server 2005: Immediate Deadlock notifications

Deadlocks… huh??

Deadlocks can be a pain to debug since they’re so rare and unpredictable. The problem lies in repeating them in your dev environment. That’s why it’s crucial to have as much information about them from the production environment as possible.

There are two ways to monitor deadlocks, about which I’ll talk about in the future posts. Those are SQL Server tracing and Error log checking. Unfortunately both of them suffer from the same thing: you don’t know immediately when a deadlock occurs. Getting this info as soon as possible is sometimes crucial in production environments. Sure you can always set the trace flag 1222 on, but this still doesn’t solve the immediate notification problem.

One problem for some might be that this method is only truly useful if you limit data access to stored procedures. <joke> So all you ORM lovers stop reading since this doesn’t apply to you anymore! </joke>

The other problem is that it requires a rewrite of the problematic stored procedures to support it. However since SQL Server 2005 came out my opinion is that every stored procedure should have the try … catch block implemented. There’s no visible performance hit from this and the benefits can be huge. One of those benefits are the instant deadlocking notifications.

Needed "infrastructure"

So let’s see how it done. This must be implemented in the database you wish to monitor of course.

First we need a view that will get lock info about the deadlock that just happened. You can read why this type of query gives info we need in my previous post.

CREATE VIEW vLocks

AS

SELECT L.request_session_id AS SPID,

DB_NAME(L.resource_database_id) AS DatabaseName,

O.Name AS LockedObjectName,

P.object_id AS LockedObjectId,

L.resource_type AS LockedResource,

L.request_mode AS LockType,

ST.text AS SqlStatementText,

ES.login_name AS LoginName,

ES.host_name AS HostName,

TST.is_user_transaction AS IsUserTransaction,

AT.name AS TransactionName

FROM sys.dm_tran_locks L

LEFT JOIN sys.partitions P ON P.hobt_id = L.resource_associated_entity_id

LEFT JOIN sys.objects O ON O.object_id = P.object_id

LEFT JOIN sys.dm_exec_sessions ES ON ES.session_id = L.request_session_id

LEFT JOIN sys.dm_tran_session_transactions TST ON ES.session_id = TST.session_id

LEFT JOIN sys.dm_tran_active_transactions AT ON TST.transaction_id = AT.transaction_id

LEFT JOIN sys.dm_exec_requests ER ON AT.transaction_id = ER.transaction_id

CROSS APPLY sys.dm_exec_sql_text(ER.sql_handle) AS ST

WHERE resource_database_id = db_id()

GO

Next we have to create our stored procedure template:

CREATE PROC <ProcedureName>

AS

BEGIN TRAN

BEGIN TRY

<SPROC TEXT GOES HERE>

COMMIT

END TRY

BEGIN CATCH

SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server

http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/

SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server

This is very common request recently – How to import CSV file into SQL Server? How to load CSV file into SQL Server Database Table? How to load comma delimited file into SQL Server? Let us see the solution in quick steps.

CSV stands for Comma Separated Values, sometimes also called Comma Delimited Values.

Create TestTable

USE TestData
GO
CREATE TABLE CSVTest
(ID INT,
FirstName VARCHAR(40),
LastName VARCHAR(40),
BirthDate SMALLDATETIME)
GO

Create CSV file in drive C: with name csvtest.txt with following content. The location of the file is C:csvtest.txt

1,James,Smith,19750101

2,Meggie,Smith,19790122

3,Robert,Smith,20071101

4,Alex,Smith,20040202

Now run following script to load all the data from CSV to database table. If there is any error in any row it will be not inserted but other rows will be inserted.

BULK
INSERT CSVTest
FROM 'c:csvtest.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = 'n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest
GO
--Drop the table to clean up database.
DROP TABLE CSVTest
GO

SQL: Monitor Deadlock in SQL 2012 using Extended Events

http://www.sqlservercentral.com/blogs/james-sql-footprint/2012/08/12/monitor-deadlock-in-sql-2012/

Monitor Deadlock in SQL 2012 Using Extended Events

Do you still use trace flag 1204 and 1222 to monitor Deadlock? or using profile to capture deadlock? Now we are in SQL Server 2012! One of the biggest improvement of SQL 2012 is Extended Events.

Extended Events can replace SQL Profiler, and it is more powerful with less performance impact than SQL Profiler. Extended Events has been introduced in SQL Server world from SQL2008, and in SQL2012, it has been integrated into SQL Server Management Studio(SSMS), see the pic below:

Now you can use SSMS to manage your Extends Events session. By default, there are 2 session created, "AlwaysOn_health" and "system_health", and "system_health" is started when SQL Service startup. You can script the session and check the defination:
CREATE EVENT SESSION [system_health] ON SERVER
ADD EVENT sqlclr.clr_allocation_failure(
ACTION(package0.callstack,sqlserver.session_id)),
ADD EVENT sqlclr.clr_virtual_alloc_failure(
ACTION(package0.callstack,sqlserver.session_id)),
ADD EVENT sqlos.memory_broker_ring_buffer_recorded,
ADD EVENT sqlos.memory_node_oom_ring_buffer_recorded(
ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack)),
ADD EVENT sqlos.scheduler_monitor_deadlock_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_non_yielding_iocp_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_non_yielding_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_non_yielding_rm_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_stalled_dispatcher_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_system_health_ring_buffer_recorded,
ADD EVENT sqlos.wait_info(
ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text)
WHERE ([duration]>(15000) AND ([wait_type]>(31) AND ([wait_type]>(47) AND [wait_type]<(54) OR [wait_type]<(38) OR [wait_type]>(63) AND [wait_type]<(70) OR [wait_type]>(96) AND [wait_type]<(100) OR [wait_type]=(107) OR [wait_type]=(113) OR [wait_type]>(174) AND [wait_type]<(179) OR [wait_type]=(186) OR [wait_type]=(207) OR [wait_type]=(269) OR [wait_type]=(283) OR [wait_type]=(284)) OR [duration]>(30000) AND [wait_type]<(22)))),
ADD EVENT sqlos.wait_info_external(
ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text)
WHERE ([duration]>(5000) AND ([wait_type]>(365) AND [wait_type]<(372) OR [wait_type]>(372) AND [wait_type]<(377) OR [wait_type]>(377) AND [wait_type]<(383) OR [wait_type]>(420) AND [wait_type]<(424) OR [wait_type]>(426) AND [wait_type]<(432) OR [wait_type]>(432) AND [wait_type]<(435) OR [duration]>(45000) AND ([wait_type]>(382) AND [wait_type]<(386) OR [wait_type]>(423) AND [wait_type]<(427) OR [wait_type]>(434) AND [wait_type]<(437) OR [wait_type]>(442) AND [wait_type]<(451) OR [wait_type]>(451) AND [wait_type]<(473) OR [wait_type]>(484) AND [wait_type]<(499) OR [wait_type]=(365) OR [wait_type]=(372) OR [wait_type]=(377) OR [wait_type]=(387) OR [wait_type]=(432) OR [wait_type]=(502))))),
ADD EVENT sqlserver.connectivity_ring_buffer_recorded(SET collect_call_stack=(1)),
ADD EVENT sqlserver.error_reported(
ACTION(package0.callstack,sqlserver.database_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack)
WHERE ([severity]>=(20) OR ([error_number]=(17803) OR [error_number]=(701) OR [error_number]=(802) OR [error_number]=(8645) OR [error_number]=(8651) OR [error_number]=(8657) OR [error_number]=(8902)))),
ADD EVENT sqlserver.security_error_ring_buffer_recorded(SET collect_call_stack=(1)),
ADD EVENT sqlserver.sp_server_diagnostics_component_result(SET collect_data=(1)
WHERE ([sqlserver].[is_system]=(1) AND [component]<>(4))),
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.event_file(SET filename=N’system_health.xel’,max_file_size=(5),max_rollover_files=(4)),
ADD TARGET package0.ring_buffer(SET max_events_limit=(5000),max_memory=(4096))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=120 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

So "system_health" will monitor the deadlock event by default. let’s try the script below to generate deadlock scenario