Fabiano Amorim

Arquivo

Arquivo do Autor

Fooling Statistics and Number of CPUs v1

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

There are many cases when QO can create an execution plan that access the table using a join operator, and using parallelism. To see a sample I’ll show the index intersection.

It is used when SQL can join the result of two indexes to return the query.

An interesting sample could be the following query using AdventureWorksDWR2:

   SELECT [ProductAlternateKey], [ProductSubcategoryKey]

   FROM DimProduct

   WHERE [ProductAlternateKey] = 'BA-8327'

   OR [ProductSubcategoryKey] = 12

   OPTION (RECOMPILE)

The query is using the indexes by ProductAlternateKey and ProductSubcategoryKey on the table DimProduct to read all the data, joining the results using the Hash Join and then it applies the filter on the where clause.

Now to simulate a parallel plan, we can do some good tricks :-). First you can fool SQL about the number of CPUs that you have. Second you can fool SQL about the size of your table.

To fool SQL about the number of CPUs you can start the SQL using the parameter –P16.

SQL Server Configuration Manager:

Error Log file after restart SQL Server:

Alternatively you can check how many CPUs do you have using the DMV sys.dm_os_sys_info.

   SELECT cpu_count FROM sys.dm_os_sys_info

cpu_count

———–

And to fool SQL Server about the size of the table, we could change the statistics of the clustered index of the table. For that, we can use the UPDATE STATISTICS and specify the RowCount and PageCount options.

   UPDATE STATISTICS DimProduct [PK_DimProduct_ProductKey] WITH RowCount = 500000,

   PageCount = 100000

After run the update statistics our query starts to use parallelism to perform the reads and join using all CPUs in parallel.

   SELECT [ProductAlternateKey], [ProductSubcategoryKey]

   FROM DimProduct

   WHERE [ProductAlternateKey] = 'BA-8327'

   OR [ProductSubcategoryKey] = 12

   OPTION (RECOMPILE)

Categorias:Não categorizado

“Esconder” código do Profiler

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

Estou em um cliente fazendo alguns trabalhos de segurança e precisei esconder um código SQL dos DBAs que tem permissão de ALTER TRACE (podem usar o Profiler), mas que não tem acesso aos objetos (procs, functions, views e etc…) do banco.

Ou seja, o cara pode ver os códigos enviados para o SQL Server, mas não tem acesso para manipulação dos objetos nos bancos de dados.

Uma maneira seria criptografar a procedure, isso faz com que o Profiler não mostre o código executado.

Se você não quiser criar uma proc para encapsular este código, você pode usar uma método, digamos que, alternativo. Alegre (quem nunca pecou que atire a primeira pedra. Eu poderia estar matando, roubando ou coisa pior, mas não, só estou criando um métodozinho alternativo).

Use a function EncryptByPassPhrase, veja um exemplo:

SELECT 'Teste Código que ninguem pode ver pelo Profiler'

WHERE EncryptByPassPhrase('','') <> ''

Veja o que aparece no Profiler:

Is it nice or what? Alegre

#Fica_a_dica

Categorias:Não categorizado

Undocumented,"STATISTICS_ONLY", "DBCC AUTOPILOT" and "SET AUTOPILOT"

24 de agosto de 2011 Fabiano Amorim 4 comentários

Introduction

As we know SQL Server uses a cost based optimizer to create the execution plans. That means SQL Server evaluates many possibilities to create an execution plan and chose for the plan with the lower cost.

One of the problems we have today is that if we want to predict how a query will perform adding a new index on the table, we need to wait for the creation of the index.

Sometimes waiting for the creation of the index is a nightmare, especially if the table is too big. Furthermore, after wait for 20 minutes for the creation of the index, when you go look at the query plan it is not using the index Smiley triste .

So the question is, how to create an hypothetically index? Just to test if the index really will be useful for the query.

WITH STATISTICS_ONLY

To create an hypothetically index you could use an undocumented syntax in the create index command.

For instance:

USE AdventureWorksDW
GO
CREATE INDEX ix_FirstName ON DimCustomer(FirstName) WITH STATISTICS_ONLY = -1
GO

This will create the index only with the statistics (density and histogram stuff). You could check the index using the sp_helpindex:

sp_HelpIndex DimCustomer

DBCC SHOW_STATISTICS(DimCustomer, ix_FirstName)

My friend Benjamin Nevares did a very good explanation about this here:

http://sqlblog.com/blogs/ben_nevarez/archive/2009/11/11/database-engine-tuning-advisor-and-the-query-optimizer.aspx

PS: If you create the index using “WITH STATISTICS_ONLY = 0” SQL will not create the statistic for your table. Just the hypothetical index.

DBCC AUTOPILOT and SET AUTOPILOT

Now that we have the hypothetical index, how to use it?

Let’s try to use the “index” hint :

SELECT * FROM DimCustomer WITH(index=ix_FirstName)

WHERE FirstName = N’Eugene’

Msg 308, Level 16, State 1, Line 1

Index ‘ix_FirstName’ on table ‘DimCustomer’ (specified in the FROM clause) does not exist.

What about use the indexid?

SELECT * FROM DimCustomer WITH(index=5)

WHERE FirstName = N’Eugene’

Msg 307, Level 16, State 1, Line 1

Index ID 5 on table ‘DimCustomer’ (specified in the FROM clause) does not exist.

So how we can create a new query plan that consider this index?

Now the fun stuff. Alegre

The DBCC AUTOPILOT is used to tell to the Optimizer to consider an specific index to the query plan creation. This DBCC plus the SET AUTOPILOT ON it’s the way to allow the use of the index.

Following is the syntax of the commands:

SET AUTOPILOT ON|OFF

/*

DBCC TRACEON (2588)

DBCC HELP(‘AUTOPILOT’)

*/

DBCC AUTOPILOT (typeid [, dbid [, {maxQueryCost | tabid [, indid [, pages [, flag [, rowcounts]]]]} ]])

Following is a complete scenario with an sample of the commands:

— Current Cost = 0,762133

— Clustered Index Scan on pk

SELECT * FROM DimCustomer

WHERE FirstName = N’Eugene’

GO

— creating the index

— DROP INDEX ix_FirstName ON DimCustomer

CREATE INDEX ix_FirstName ON DimCustomer(FirstName) WITH STATISTICS_ONLY = -1

GO

— Looking at the info necessary in the DBCC AUTOPILOT comand

SELECT name, id, Indid, Dpages, rowcnt

FROM sysindexes

WHERE id = object_id(‘DimCustomer’)

GO

DBCC AUTOPILOT (5, 9, 0, 0, 0, 0, 0) — Starting with the TypeID

DBCC AUTOPILOT (6, 9, 37575172, 1, 0, 0, 0) — Clustered Index with TypeID 6

DBCC AUTOPILOT (0, 9, 37575172, 2, 0, 0, 0) — All other index with TypeID 0

DBCC AUTOPILOT (0, 9, 37575172, 3, 0, 0, 0) — All other index with TypeID 0

DBCC AUTOPILOT (0, 9, 37575172, 5, 0, 0, 0) — All other index with TypeID 0

GO

SET AUTOPILOT ON

GO

— Query to create the estimated execution plan with the cost = 0,0750712

SELECT * FROM dbo.DimCustomer

WHERE FirstName = N’Eugene’

OPTION (RECOMPILE)

GO

SET AUTOPILOT OFF

GO

I realized that you can fool the Optimizer telling different numbers for the Pages and RowCount parameters of the DBCC AUTOPILOT. If you let it zero it use the Clustered index values.

Conclusion

There is a lot of mystery about this features, but I’m sure this will be a good start of tests for you.

I’m still playing with this, so be confortable to ask something or share a new discovery. Alegre

I don’t need to tell you to don’t use this is a production environment do I? This is a undocumented stuff so nobody can guaranty what it is really doing unless MS folks make it officially public and documented.

That’s all folks

Categorias:Não categorizado

Count(*), Count(1), Count(PK)

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

Responda rápido! O que é melhor?

   1: SELECT COUNT(1)

   2:   FROM Produtos

   3: GO

   4: SELECT COUNT(*)

   5:   FROM Produtos

   6: GO

   7: SELECT COUNT(ID_Produto) -- PK

   8:   FROM Produtos

   9: GO

Tanto faz. Na verdade o Query Optimizer traduz tudo para COUNT(*).

O que ????

Pois é, quer que eu prove?

Fácil, veja os planos:

   1: SELECT COUNT(1) FROM Produtos

   2:   |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

   3:        |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))

   4:             |--Clustered Index Scan(OBJECT:([Produtos].[xpk_Produtos]))

5:

   6: SELECT COUNT(*) FROM Produtos

   7:   |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

   8:        |--Stream Aggregate(DEFINE:([Expr1004]= Count(*)))

   9:             |--Clustered Index Scan(OBJECT:([Produtos].[xpk_Produtos]))

10:

  11: SELECT COUNT(ID_Produto) FROM Produtos

  12:   |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

  13:        |--Stream Aggregate(DEFINE:([Expr1004]= Count(*)))

Categorias:Não categorizado

SQL11, Denali, O que tem de novo?

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

Como alguns já sabem o CTP1 do SQL Server 11 Codinome “Delani” foi lançado oficialmente esta semana no PASS.

Infelizmente a documentação liberada junto com o download do CTP1 tem pouquíssimas informações sobre o que realmente mudou e nesta versão.

Eu estou fazendo alguns testes e encontrei algumas coisas interessantes:

Novos Objetos

sp_server_diagnostics – Como o nome já diz, esta SP faz um diagnostico do servidor e retorna o resultado separados em 5 categorias.

System
Resource
Query_Processing
Is_subSystem
Events

Eu achei os resultados fantásticos, informações valiosíssimas sobre o que está acontecendo no seu servidor.

Segue um print da execução da proc e os resultados que ela apresenta.

Utilizei o XML Notepad para visualizar os dados XML da columa data, veja abaixo alguns prints:

DMV sys.dm_exec_describe_first_result_set, dm_exec_describe_first_result_set_for_object – Pelo que o que percebi, estas DMVs retornam um metadata completíssimo de uma determinada consulta ou objeto… por ex:

select * from sys.dm_exec_describe_first_result_set(‘SELECT * FROM Produtos ORDER BY ID_Produto DESC’, ”, 0)

O Interessante é que da pra passar um T-SQL com erro, dai você pode ler o erro nas colunas de erro…

select * from sys.dm_exec_describe_first_result_set(‘SELECT * FROM Produtos ORDER BY ID_Produto DESCC’, ”, 1)

CREATE PROC st_Test

AS

SELECT * FROM Pedidos

GO

select * from sys.dm_exec_describe_first_result_set_for_object(Object_ID(‘st_Test’), 0)

Novas regas para otimização das consultas, para encontrar estas regras utilizei a DVM sys.dm_exec_query_transformation_stats. Comparando a versão do SQL 11 com meu SQL 2008 R2.

Agora nos resta entender o que estas regras fazem 🙂

select * from sys.dm_exec_query_transformation_stats a

where not exists (select 1

from [NB_FABIANO\SQL2008R2_1].master.sys.dm_exec_query_transformation_stats b

where a.name = b.name)

Fique de olho.. logo devo postar mais sobre algumas novidades bem interessantes.. .

Abraços

Categorias:Não categorizado

IT Material VS Political Correctly Text

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

I really love to writing about something especially about SQL Server, I also like to read good stuff and it is much better if the text is funny, I call this a “clean read”.

I’m not talking about a lot of jokes and nothing related to the content of the material, but sometimes the read is better when we see some comparisons with our life…

I’ve seen and tried to write good texts using this, for instance, comparisons with movies characters, famous quotes by someone (Developers, Developers, Developers… J) or predictions ("Two years from now, spam will be solved." Bill Gates, 2004) and so on.

One day in a chat with Fernando Guerreiro (Blog | Twitter) I asked: – Why text editors don’t like jokes? And he wisely said, Jokes don’t work at a global level.

I struggle with this problem myself and now I know that, if we want to reach global readers we need be very carefully about the different cultures. I also learn that It’s impossible write something that will receive 100% of approbations for everybody.

So now, some questions rises: – What do you think about it? Do we have to be political correctly and write regular texts? Do we have to write a book or an article using those things? Why IT books in general have to looks like a Manual, and even sometimes quite boring?

I would love to hear your opinion, please leave a comment about your thoughts.

Categorias:Não categorizado

Statistics VS Index Rebuild

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

Hi folks, a question came to me asking about if is necessary update the statistics even after an index rebuild. Because the answer is it depends and we don’t like this answer I did some tests to show you when you need update statistics after a rebuild and when not.

As we know SQL Server could create a statistic if the Auto Update Statistics parameter is Enabled. These statistics are that one starting with “_WA_Sys…”, be aware that SQL Server do not auto-create multi-column statistics automatically(except with indexes). We also have statistics creates by Indexes, and statistics created manually with the CREATE STATISTICS command.

So, let’s check all of this… First I’ll create a table with four columns, and populate with some garbage data.

use tempdb

GO

IF OBJECT_ID(‘T1’) IS NOT NULL

DROP TABLE T1

GO

CREATE TABLE T1 (Col1 Int, Col2 Int, Col3 Int, Col4 Int)

GO

INSERT INTO T1 VALUES(1,1,1,1)

GO 1000

CREATE CLUSTERED INDEX ix_Col1 ON T1(Col1)

CREATE INDEX ix_Col2 ON T1(Col2)

CREATE STATISTICS stats_Col3 ON T1(Col1)

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

As you can see we have 3 statistics for the table T1. Two for the index and one called stats_Col3. Now in order to create an auto-statistics I’ll run the following query using the column C4 into the where condition.

SELECT * FROM T1

WHERE Col4 = 1

GO

If we check the sysindexes again, now we’ll see the auto-created statistic.

Following let’s check some different scenarios to rebuild an index.

Scenario 1

Now, the following script runs a DBCC REINDEX to rebuild the Clustered Index ix_Col1.

— Scenario 1: DBCC REINDEX in the Clustered Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

DBCC DBREINDEX(T1, ix_Col1)

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

As we can see in the picture above, the statistics for the indexes were updated, but the auto-created and the manually don’t.

Scenario 2

— Scenario 2: DBCC REINDEX in the NonClustered Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

DBCC DBREINDEX(T1, ix_Col2)

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

As expected now the only statistic updated was the ix_Col2.

Scenario 3

— Scenario 3: DBCC REINDEX without specify the index. That means all index must be updated.

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

DBCC DBREINDEX(T1)

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

Now all statistics were updated, including the auto and manually created.

Scenario 4

— Scenario 4: ALTER INDEX to update the Clustered Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

ALTER INDEX ix_Col1 ON T1 REBUILD;

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

Here, look that just to use the ALTER INDEX instead the DBCC REINDEX things are different. In the Scenario 1 we got the two index Updated, and using the ALTER INDEX we don’t, just the ix_Col1 was updated.

Scenario 5

— Scenario 5: ALTER INDEX to update the NonClustered Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

ALTER INDEX ix_Col2 ON T1 REBUILD;

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

As expected now the only statistic updated was the ix_Col2. This is the same behavior that the Scenario 2.

Scenario 6

— Scenario 6: ALTER INDEX to update the ALL Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

ALTER INDEX ALL ON T1 REBUILD;

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

Now both index statistics were update, but the auto and manually created don’t. Note that we expect that this act like the Scenario 3 where we use the DBCC REINDEX, but this is don’t have the same behavior.

It depends

As you can see in the scenarios above it depends to the command you are using may some statistics are updated and some not. But that’s is not all J, it also depends the SQL Server version you are using. I ran the scripts above into SQL Server 2008 R2 and SQL Server 2008 and the results were the same, but when I ran into SQL Server 2005 SP3 I got a slight difference.

Following we can see that the Scenario 1 don’t update all index statistics, just the clustered index.

— Scenario 1: DBCC REINDEX in the Clustered Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

DBCC DBREINDEX(T1, ix_Col1)

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

The big surprise was when I run it into SQL Server 2000 SP4. The Scenario 1 acts very different to the other versions.

— Scenario 1: DBCC REINDEX in the Clustered Index

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

DBCC DBREINDEX(T1, ix_Col1)

GO

SELECT name, Stats_date(id, indid) AS LastUpdate

FROM sysindexes

WHERE id = Object_id(‘T1’)

GO

As you can see, when you update the Clustered Index all statistics are updated, including the auto and manually created.

Resume

Following are a resume of the scenarios.

Categorias:Não categorizado

Cursos T-SQL da SolidQ no Brasil

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

Pessoal, é com GRANDE prazer que falo a vocês que estamos trazendo os cursos de T-SQL da SolidQ para o Brasil.

Eu estou trabalhando na tradução do material de 2 cursos criados pelo nosso amigo e gênio Itzik Ben-Gan, vai por mim, tem muita coisa boa, e informações privilegiadas que você só verá no curso… vale muito a pena.

Portanto se você é DBA ou Desenvolvedor, não perca a chance…. logo estarei publicando mais informações sobre isso, fique de olho.

É claro que o treinamento no Brasil terá alguns “toques” especiais que irei incluir no curso Alegre , com minha experiência de como usamos o T-SQL no Brasil mais o material da SolidQ, não tenho dúvida que iremos ministrar o melhor curso de T-SQL da América Latina.

Por enquanto para aguçar sua vontade, segue uma breve descrição dos dois cursos de T-SQL.

Fundamentos do T-SQL (Não se deixe enganar pelo nome, é um curso com muitos exemplos nível 400)

Os tópicos cobertos por esse curso incluem: Background para Consultas e Programação em T-SQL; Consultas em uma única tabela; Joins; Sub-Consultas; Table Expressions; Operações com Conjuntos; Pivot, Unpivot e Groupins Sets; Modificação de Dados; Transações; Overview dos Objetos de Programação.
Para complementar a experiência de aprendizado os alunos receberão exercícios que irão habilitá-los de praticar o que eles aprenderam.

T-SQL Querying Avançado, Programação e Tuning para SQL Server 2005 e 2008 (Nivel 400)

Este curso é direcionado em escrever e aperfeiçoar consultas e programação com T-SQL no SQL Server 2005 e 2008. Neste curso você irá aprender os detalhes e capacidades do T-SQL nas seguintes áreas: Logical Query Processing, Query Tunning, SubQueries, Ranking, Functions, Joins e Grouping sets; Aggregating e Pivot de dados; TOP and APPLY; Modificações de dados; Problemas relacionados a Data Types; Objects Programáveis (SQL dinâmico, Views, User Defined Functions, Stored Procedures, Triggers, Transações e Concorrência, Controle de Exceções); Árvores e Hierarquias.

Galera, imperdível… agora deixo outra supresa… Estou preparando um curso e um seminário de Indexação e Execution Plans… logo irei publicar o conteudo do curso.

Agora deixo uma pergunta… você gostaria de fazer estes cursos? Responda aqui no blog com um comentário…

Abraços

Categorias:Não categorizado

Memory Grant, Sort Warnings

24 de agosto de 2011 Fabiano Amorim Deixe um comentário

I was doing some tests with memory grant,sort warnings and amount of avaliable memory, so here are some pictures of my tests…

When I wrote this is an indication, I don´t mean 100% of certain that if you add more memory you will have a performance gain. But sort warnings can indicate memory pressure.

For instance, suppose a very concurrently scenario, with many queries requiring memory grant. When I run a query the server calculates the memory limit for that query. By default, this is 25% (20% on 32bit SQL 2005) of total query memory (which is set by memory broker as about 90% of server memory). This per-query limit helps to prevent one query from dominating the whole server.

Following is some picture of a simulation of this scenario.

1024 MB of Memory

EXEC sys.sp_configure N’max server memory (MB)’, N’1024′

GO

RECONFIGURE WITH OVERRIDE

GO

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

GO

512 MB of Memory

EXEC sys.sp_configure N’max server memory (MB)’, N’512′

GO

RECONFIGURE WITH OVERRIDE

GO

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

GO

256 MB of Memory

EXEC sys.sp_configure N’max server memory (MB)’, N’256′

GO

RECONFIGURE WITH OVERRIDE

GO

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

GO

64 MB of Memory

As you can see according to the memory change, the SQL can´t grant memory to some queries. And with just a few memory available the sort is exhibited.

Instead of change the available memory I could add more threads into SQLQueryStress to simulate the concurrency into the server, if you do this, may you see

some Sort Warnings into Profiler.

I’m not saying that you are wrong, I totally agree with you that SQL uses cardinality to estimate the amount of memory to the query, and the main problem is, a wrong estimation can lead to a bad memory grant. The point here is, yes, the sort warning can indicate a memory pressure.

SQLQueryStrees by Adam Machanic http://www.datamanipulation.net

Categorias:Não categorizado